Skip to content

Conversation

Copy link

Copilot AI commented Dec 22, 2025

Reduces object allocations and GC pressure during CSS parsing through targeted caching and pooling strategies. Expected ~40% reduction in allocations and ~25-30% improvement in parse speed for large CSS files.

Changes

Object Pooling

  • Added LocatorPool with ThreadLocal pools (max 32 per thread) to reuse Locator objects
  • Updated Locator.createLocator() to acquire from pool instead of allocating
  • Added Locator.clear() method for pooling support

StringBuilder Caching

  • Added ThreadLocal StringBuilder cache (256 initial capacity) in AbstractCSSParser
  • Updated addEscapes() to use cached StringBuilder
  • Exposed via getCachedStringBuilder() / returnCachedStringBuilder() for future use

Message Caching

  • Thread-safe error message cache using ConcurrentHashMap (soft limit 100 entries)
  • Eliminates repeated HashMap lookups in getParserMessage()

Utility Optimizations

  • Added ParserUtils.trimBy(String, int, int) with bounds checking to avoid defensive copies
  • Added ParserUtils.equalsIgnoreCase(CharSequence, CharSequence) for allocation-free comparison

Performance Monitoring

  • Optional PerformanceMetrics class (enable with -Dhtmlunit.cssparser.metrics=true)
  • Tracks parse time, token count, rule count, property count

Example Usage

// Before: Creates new Locator on every token
protected Locator createLocator(final Token t) {
    return new Locator(getInputSource().getURI(), t.beginLine, t.beginColumn);
}

// After: Reuses pooled Locators
protected Locator createLocator(final Token t) {
    return LocatorPool.acquire(getInputSource().getURI(), t.beginLine, t.beginColumn);
}

Thread Safety

All caching uses ThreadLocal (StringBuilder, LocatorPool) or ConcurrentHashMap (message cache) to avoid synchronization overhead.

Testing

  • Added LocatorPoolTest for pooling behavior and thread isolation
  • Added PerformanceTest for benchmarking and validation
  • Extended ParserUtilsTest for new utility methods
  • All 653 existing tests pass

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • checkstyle.org
    • Triggering command: /opt/hostedtoolcache/CodeQL/2.23.8/x64/codeql/tools/linux64/java/bin/java /opt/hostedtoolcache/CodeQL/2.23.8/x64/codeql/tools/linux64/java/bin/java -jar /opt/hostedtoolcache/CodeQL/2.23.8/x64/codeql/xml/tools/xml-extractor.jar --fileList=/home/REDACTED/work/htmlunit-cssparser/.codeql-scratch/dbs/java/working/files-to-index16585075317241321623.list --sourceArchiveDir=/home/REDACTED/work/htmlunit-cssparser/.codeql-scratch/dbs/java/src --outputDir=/home/REDACTED/work/htmlunit-cssparser/.codeql-scratch/dbs/java/trap/java (dns block)
  • commons.apache.org
    • Triggering command: /usr/lib/jvm/temurin-17-jdk-amd64/bin/java /usr/lib/jvm/temurin-17-jdk-amd64/bin/java --enable-native-access=ALL-UNNAMED -classpath /usr/share/apache-maven-3.9.11/boot/plexus-classworlds-2.9.0.jar -Dclassworlds.conf=/usr/share/apache-maven-3.9.11/bin/m2.conf -Dmaven.home=/usr/share/apache-maven-3.9.11 -Dlibrary.jansi.path=/usr/share/apache-maven-3.9.11/lib/jansi-native -Dmaven.multiModuleProjectDirectory=/home/REDACTED/work/htmlunit-cssparser/htmlunit-cssparser org.codehaus.plexus.classworlds.launcher.Launcher clean verify -q (dns block)
  • junit.org
    • Triggering command: /usr/lib/jvm/temurin-17-jdk-amd64/bin/java /usr/lib/jvm/temurin-17-jdk-amd64/bin/java --enable-native-access=ALL-UNNAMED -classpath /usr/share/apache-maven-3.9.11/boot/plexus-classworlds-2.9.0.jar -Dclassworlds.conf=/usr/share/apache-maven-3.9.11/bin/m2.conf -Dmaven.home=/usr/share/apache-maven-3.9.11 -Dlibrary.jansi.path=/usr/share/apache-maven-3.9.11/lib/jansi-native -Dmaven.multiModuleProjectDirectory=/home/REDACTED/work/htmlunit-cssparser/htmlunit-cssparser org.codehaus.plexus.classworlds.launcher.Launcher clean verify -q (dns block)
  • www.puppycrawl.com
    • Triggering command: /opt/hostedtoolcache/CodeQL/2.23.8/x64/codeql/tools/linux64/java/bin/java /opt/hostedtoolcache/CodeQL/2.23.8/x64/codeql/tools/linux64/java/bin/java -jar /opt/hostedtoolcache/CodeQL/2.23.8/x64/codeql/xml/tools/xml-extractor.jar --fileList=/home/REDACTED/work/htmlunit-cssparser/.codeql-scratch/dbs/java/working/files-to-index16585075317241321623.list --sourceArchiveDir=/home/REDACTED/work/htmlunit-cssparser/.codeql-scratch/dbs/java/src --outputDir=/home/REDACTED/work/htmlunit-cssparser/.codeql-scratch/dbs/java/trap/java (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Original prompt

Performance Optimizations for CSS Parser

Problem

The CSS parser, while functionally correct, has several performance bottlenecks that impact parsing speed, especially for large CSS files:

  1. Excessive object creation - Creates many temporary objects during parsing
  2. String operations - Repeated string concatenations and allocations
  3. Inefficient unescaping - Processes escape sequences multiple times
  4. HashMap lookups - Frequent error message lookups
  5. No object pooling - Recreates common objects like Locators
  6. Unnecessary allocations - Creates objects that could be reused
  7. StringBuilder waste - Creates new StringBuilders frequently

Performance Impact:

  • Parsing large CSS files (>100KB) is slow
  • High memory allocation rate (GC pressure)
  • Repeated allocations for common patterns
  • String operations dominate CPU time

Solution

Implement targeted performance optimizations that:

  • Reduce object allocations by ~40%
  • Improve parsing speed by ~25-30%
  • Reduce memory footprint
  • Maintain backward compatibility
  • Don't compromise code readability

Implementation Plan

1. String Builder Pooling

Problem: StringBuilder objects are created and discarded frequently.

Solution: Use ThreadLocal StringBuilder cache

// Add to AbstractCSSParser.java

/**
 * Thread-local StringBuilder cache to reduce allocations.
 * Automatically cleared after use to prevent memory leaks.
 */
private static final ThreadLocal<StringBuilder> STRING_BUILDER_CACHE = 
    ThreadLocal.withInitial(() -> new StringBuilder(256));

/**
 * Gets a cached StringBuilder, cleared and ready to use.
 * @return A cleared StringBuilder instance
 */
protected StringBuilder getCachedStringBuilder() {
    StringBuilder sb = STRING_BUILDER_CACHE.get();
    sb.setLength(0); // Clear instead of creating new
    return sb;
}

/**
 * Returns a cached StringBuilder after extracting its content.
 * @param sb The StringBuilder to extract from
 * @return The string content
 */
protected String returnCachedStringBuilder(StringBuilder sb) {
    return sb.toString();
}

Usage in parser:

// Before:
String buildSelector() {
    StringBuilder sb = new StringBuilder();
    sb.append("selector");
    sb.append(" > ");
    sb.append("element");
    return sb.toString();
}

// After:
String buildSelector() {
    StringBuilder sb = getCachedStringBuilder();
    sb.append("selector");
    sb.append(" > ");
    sb.append("element");
    return returnCachedStringBuilder(sb);
}

2. Locator Object Pooling

Problem: Locator objects are created for every token but are short-lived.

Solution: Implement a simple object pool

// New class: src/main/java/org/htmlunit/cssparser/parser/LocatorPool.java

package org.htmlunit.cssparser.parser;

import java.util.ArrayDeque;
import java.util.Deque;

/**
 * Simple object pool for Locator instances to reduce allocations.
 * Uses ThreadLocal to avoid synchronization overhead.
 */
public class LocatorPool {
    
    private static final int MAX_POOL_SIZE = 32;
    
    private static final ThreadLocal<Deque<LocatorImpl>> POOL = 
        ThreadLocal.withInitial(() -> new ArrayDeque<>(MAX_POOL_SIZE));
    
    /**
     * Acquires a Locator from the pool or creates a new one.
     * 
     * @param uri The URI
     * @param line The line number
     * @param column The column number
     * @return A Locator instance
     */
    public static Locator acquire(String uri, int line, int column) {
        Deque<LocatorImpl> pool = POOL.get();
        LocatorImpl locator = pool.poll();
        
        if (locator == null) {
            locator = new LocatorImpl();
        }
        
        locator.setUri(uri);
        locator.setLineNumber(line);
        locator.setColumnNumber(column);
        
        return locator;
    }
    
    /**
     * Returns a Locator to the pool for reuse.
     * 
     * @param locator The locator to return
     */
    public static void release(Locator locator) {
        if (locator instanceof LocatorImpl) {
            Deque<LocatorImpl> pool = POOL.get();
            if (pool.size() < MAX_POOL_SIZE) {
                LocatorImpl impl = (LocatorImpl) locator;
                impl.clear(); // Clear references
                pool.offer(impl);
            }
        }
    }
    
    /**
     * Clears the pool (useful for testing).
     */
    public static void clear() {
        POOL.get().clear();
    }
}

Update LocatorImpl to support pooling:

// Modify src/main/java/org/htmlunit/cssparser/parser/LocatorImpl.java

public class LocatorImpl implements Locator {
    private String uri_;
    private int lineNumber_;
    private int columnNumber_;
    
    // Make fields mutable for pooling
    public void setUri(String uri) {
        this.uri_ = uri;
    }
    
    public void setLineNumber(int lineNumber) {
        this.lineNumber_ = lineNumber;
    }
    
    public void setColumnNumber(...

</details>



<!-- START COPILOT CODING AGENT SUFFIX -->

*This pull request was created from Copilot chat.*
>

<!-- START COPILOT CODING AGENT TIPS -->
---

💬 We'd love your input! Share your thoughts on Copilot coding agent in our [2 minute survey](https://gh.io/copilot-coding-agent-survey).

Copilot AI and others added 3 commits December 22, 2025 13:30
…caching, ParserUtils optimizations

Co-authored-by: rbri <2544132+rbri@users.noreply.github.com>
Co-authored-by: rbri <2544132+rbri@users.noreply.github.com>
…, remove unused method

Co-authored-by: rbri <2544132+rbri@users.noreply.github.com>
Copilot AI changed the title [WIP] Implement performance optimizations for CSS parser Performance optimizations: object pooling, StringBuilder caching, and message caching Dec 22, 2025
Copilot AI requested a review from rbri December 22, 2025 13:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants