Skip to content

Conversation

@jamesfredley
Copy link
Contributor

@jamesfredley jamesfredley commented Jan 24, 2026

Summary

This PR optimizes CI workflow performance, fixes multiple flaky tests caused by static state pollution during parallel test execution, and adds developer experience improvements. After fixing the root causes of test flakiness, Develocity test retries have been removed.

CI Workflow Optimizations

Build Performance

  • Use --rerun-tasks flag for reliable CI builds
  • Disable test caching in CI via outputs.cacheIf { !isCiBuild }
  • Disable Groovy compilation cache in CI (GroovyCompile tasks) to ensure AST transformations are always applied
  • Increase maxParallelForks from 3 to 4 (matches GitHub runner cores)
  • Increase forkEvery from 10-25 to 50 (reduces JVM restart overhead)
  • Enable Develocity test distribution for CI builds
  • Remove hardcoded maxParallelForks = 1 restriction from grails-gsp module
  • Enable cancel-in-progress for PR workflow runs - Cancels outdated workflow runs when new commits are pushed to a PR across all CI workflows (gradle.yml, codestyle.yml, codeql.yml, groovy-joint-workflow.yml, rat.yml)
  • Add new buildRerunTasks job for building Grails-Core with rerun tasks enabled

CI Matrix Reduction

  • Build job: Changed from full OS/Java cross-product to specific combinations:
    • Ubuntu: Java 17, 21, 25
    • macOS: Java 21 only
    • Windows: Java 25 only
  • Other jobs (buildGradle, functional, mongodbFunctional, hibernate5Functional): Changed from [17, 21] to [17, 25]
  • Grails Forge build: Changed to [17, 21]

Test Retry Removal

After fixing the flaky tests, Develocity test retries have been removed from:

  • gradle/test-config.gradle - Removed testRetry block
  • build-logic/docs-core/build.gradle - Removed develocity testRetry block
  • grails-data-hibernate5/dbmigration/build.gradle - Removed entire testRetry task configuration
  • grails-gradle/gradle/test-config.gradle - Removed develocity testRetry block
  • grails-forge/buildSrc/build.gradle - Removed unused test-retry plugin dependency
  • grails-forge/gradle/test-config.gradle - Removed commented-out retry code and plugin apply

Flaky Test Fixes

Root Cause Analysis

When tests run in parallel (maxParallelForks > 1), static state can leak between tests causing intermittent failures. The main culprits identified:

  1. HttpServletResponseExtension.@mimeTypes - Static cache initialized by one test used by another
  2. HttpServletResponseExtension.getMimeTypes() - Was calling webRequest.applicationContext without null check, causing NPE when GrailsWebRequest.lookup() returns null during parallel test runs
  3. HttpServletRequestExtension.getMimeTypes() - Was using servlet context attributes which get polluted by parallel tests due to GrailsUnitTest static fields
  4. Holders state - grailsApplication, pluginManager cleared mid-test by parallel cleanup
  5. Validateable.constraintsMapInternal - Constraint cache evaluated before doWithConfig() registers shared constraints
  6. ConstraintEvalUtils.defaultConstraintsMap - Global shared constraints cache keyed by config identity hash
  7. SystemPropertyTenantResolver - System property gorm.tenantId pollution between multi-tenancy tests
  8. RequestContextHolder - ThreadLocal request attributes not properly reset
  9. GrailsApplication.artefactInfo - URL mapping artefacts polluting parallel tests
  10. LinkGenerator.urlMappingsHolder - Stale reference to old URL mappings holder after bean recreation

Key Fix Pattern: Holders.clear() in Cleanup

Problem: Calling Holders.clear() in cleanup() or @AfterEach clears ALL Holders state (including grailsApplication) which can affect parallel tests that depend on Holders.grailsApplication for mime type resolution.

Solution:

  • Use targeted cleanup (only reset what you set)
  • Or use cleanupSpec()/@AfterAll which runs after all tests in the class complete
  • The GrailsUnitTest trait already handles Holders.clear() in cleanupClass()

Key Fix Pattern: Ensure grailsApplication Initialization Before Clearing Caches

Problem: Clearing static caches in setup() before grailsApplication is initialized causes doWithConfig() to not be called, meaning custom MIME types and shared constraints are not registered.

Solution: Access the config property first in setup() to trigger lazy initialization of grailsApplication, which calls doWithConfig(). Then clear the caches so they will be repopulated from the fresh config.

Fix Categories

1. MIME Type Lookup Isolation (Production Code Fixes)

HttpServletResponseExtension.getMimeTypes() was calling webRequest.applicationContext without checking if webRequest is null. When GrailsWebRequest.lookup() returns null during parallel test execution, this caused a NullPointerException.

Fix: Added null-safe operator (?.) when accessing webRequest.applicationContext.

HttpServletRequestExtension.getMimeTypes() was using WebApplicationContextUtils.getWebApplicationContext(request.servletContext) to look up MIME types. This fails in parallel tests because GrailsUnitTest uses static _servletContext which gets polluted across tests.

Fix: Now prioritizes GrailsWebRequest.lookup() (thread-local) before falling back to servlet context lookup.

2. Mime Type Cache Isolation (~20 files)

Added setup() and cleanup() methods to clear HttpServletResponseExtension.@mimeTypes before/after each test, ensuring grailsApplication is initialized first:

  • RespondMethodSpec, WithFormatContentTypeSpec, ContentFormatControllerTests
  • ContentNegotiationSpec, JSONBindingToNullSpec, MimeUtilitySpec
  • MimeTypesConfigurationSpec, AcceptHeaderParserSpec, RequestAndResponseMimeTypesApiSpec
  • HalJsonRendererSpec, JsonRendererSpec, BaseDomainClassRendererSpec
  • RestfulControllerSpec, WithFormatSpec, DefaultXmlRendererSpec
  • VndErrorRenderingSpec, HtmlRendererSpec, DefaultRendererRegistrySpec
  • And others across modules

3. Multi-Tenancy Test Isolation (~18 files)

Used Spock @RestoreSystemProperties annotation for cleaner system property isolation, combined with clearing SystemPropertyTenantResolver:

  • DatabasePerTenantSpec, SchemaPerTenantSpec, PartitionMultiTenancySpec
  • MultiTenancySpec, SingleTenantSpec, SchemaMultiTenantSpec
  • MongoStaticApiMultiTenancySpec, SchemaBasedMultiTenancySpec, SingleTenancySpec (MongoDB)
  • MongoConnectionSourcesSpec, MultiTenancySpec (MongoDB)
  • CurrentTenantTransformSpec, TenantServiceSpec, MultiTenantServiceTransformSpec
  • And Hibernate/MongoDB connection specs

4. Command Object Constraint Isolation (~15 files)

Tests that use shared constraints via doWithConfig() need proper cache clearing. Uses reflection to clear the private constraintsMapInternal field in the Validateable trait:

  • CommandObjectsSpec.groovy, CommandObjectNoDataSpec.groovy
  • ValidateableTraitSpec.groovy, ValidateableTraitAdHocSpec.groovy
  • CommandObjectConstraintGettersSpec, SerializableValidateableSpec
  • ValidateableMockSpec, DefaultASTValidateableHelperSpec
  • JsonApiSpec, CommandObjectNullabilitySpec
  • ControllerUnitTestMixinTests, JsonBindingWithExceptionHandlerSpec
  • And others

Note: A public API method clearConstraintsMapCache() will be added in Grails 7.1 (see PR #15346). This PR uses reflection to access the Groovy 4 trait static field helper interface to clear the cache without modifying the public API, which would be a breaking change in a patch release.

5. Holders.clear() Fixes (4 files)

Replaced aggressive Holders.clear() with targeted cleanup:

  • WithFormatContentTypeSpec - Removed Holders.clear() from cleanup()
  • BeanBuilderTests - Changed to Holders.setPluginManager null (only reset what setUp sets)
  • DomainClassTraitSpec - Moved Holders.clear() from cleanup() to cleanupSpec()

6. RequestContextHolder Cleanup (~10 files)

Added RequestContextHolder.resetRequestAttributes() in setup/cleanup:

  • LinkGeneratorWithUrlMappingsSpec, EncodePathFromURISpec
  • HTMLJSCodecIntegrationSpec, DefaultGrailsApplicationAttributesTests
  • FlashScopeWithErrorsTests, StreamCharBufferSpec, WebUtilsTests
  • BindingOutsideRequestSpec
  • UrlMappingsWithHttpMethodSpec, RestfulUrlMappingSpec

7. URL Mapping Artefact Isolation (~4 files)

Clear URL mapping artefacts and refresh linkGenerator urlMappingsHolder reference:

  • FormTagLibTests - Clear artefactInfo in setup/cleanup, refresh linkGenerator
  • AbstractGrailsTagTests - Clear URL mappings in tearDown() for all subclasses
  • RestfulReverseUrlRenderingTests - Clear URL mappings and re-register in setup
  • ControllerUnitTestMixin.mockArtefact() - Refresh linkGenerator.urlMappingsHolder after bean recreation

8. System Property Isolation (~20 files)

Fixed tests to use @RestoreSystemProperties for proper system property isolation:

  • UrlMappingMatcherSpec, YamlPropertySourceLoaderSpec
  • NavigableMapSpringProfilesSpec, GrailsPlaceHolderConfigurerCorePluginRuntimeSpec
  • DevelopmentModeWatchSpec, GrailsArtefactTransformerSpec
  • CreateAppCommandSpec, BookSpec
  • Neo4j and MongoDB auto-configuration specs
  • And others

Dependency Updates

  • Groovy 3.0.25 in grails-data-graphql, grails-data-neo4j, grails-forge (Java 25 compatibility)
  • Gradle wrapper 8.14.4 for generated apps (gradleWrapperProperties.rocker.raw)
  • Gradle tooling API 8.14.4 (gradle.properties)
  • .sdkmanrc updates for Java 17.0.18-librca

Related PRs

Files Changed

  • ~100 files modified across flaky test fixes, CI optimizations, test retry removal, and production code improvements

@jamesfredley jamesfredley self-assigned this Jan 24, 2026
@jamesfredley
Copy link
Contributor Author

@jdaugherty
Copy link
Contributor

In my early testing, using the cache was not always consistent because of how some of our dependencies are included. Part of the reason we force a rerun of tests was to ensure everything was functional. I am ok disabling the cache for every commit, but I think we still need to run all of the tests periodically.

I would strongly urge we fix the reason the develocity plugin retries the tests first - sometimes it reruns all of the tests 3+ times. Most of the tests that fail are in grails-gsp. If you checkout the last build, look at flaky tests to find the problematic ones. I suspect if we fixed these, the github build would go down by over half of its run time.

@jdaugherty
Copy link
Contributor

Oh and this scenario too: changing something in core might only be tested by the end to end test in forge, but forge doesn't depend on grails so gradle would just not run those tests because the cache would see nothing changed. We had lots of issues earlier on in 7.0 dev that were only caught by the functional tests and gradle would not run them always because the cache would not trigger a change. This was a reoccurring theme that I ended up wasting a lot of time chasing in 7.0 dev.

@jamesfredley jamesfredley force-pushed the improve-gh-ci-speed branch 3 times, most recently from db012ea to 2fcf65d Compare January 24, 2026 03:31
@jamesfredley
Copy link
Contributor Author

Apparently cleanTest will only force the rerun of tests when Gradle cache is disabled. I'm trying another approach.

Still experiments to see if anything reduces this from an hour, before proposing changes.

@jamesfredley jamesfredley force-pushed the improve-gh-ci-speed branch 2 times, most recently from cadbfbf to 116419c Compare January 24, 2026 06:07
@jdaugherty jdaugherty marked this pull request as ready for review January 24, 2026 06:42
Copilot AI review requested due to automatic review settings January 24, 2026 06:51
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes the CI workflow to reduce build times and fixes multiple flaky tests caused by static state pollution during parallel test execution.

Changes:

  • Optimized CI workflow by replacing --rerun-tasks with selective test cache disabling, adding configuration cache warnings, and increasing parallel execution settings
  • Reduced CI matrix combinations to test specific OS/Java pairings instead of full cross-products
  • Fixed flaky tests by adding proper cleanup of ThreadLocal state, static constraints caches, mime type caches, and tenant ID system properties in test setup methods
  • Added clearConstraintsMap() method to the Validateable trait to support constraint cache clearing in tests
  • Updated Groovy version to 3.0.25 and Gradle wrapper to 8.14.4 for Java 25 compatibility

Reviewed changes

Copilot reviewed 21 out of 22 changed files in this pull request and generated no comments.

Show a summary per file
File Description
.github/workflows/gradle.yml Optimized CI workflow by changing Java matrix combinations, removing --rerun-tasks and cleanTest, adding --configuration-cache-problems=warn
build.gradle Increased configuredTestParallel from 3 to 4 for CI builds to match GitHub runner cores
gradle/test-config.gradle Added test cache disabling for CI builds, enabled test distribution, increased forkEvery from 20 to 50
grails-test-suite-web/build.gradle Increased forkEvery from 10 to 50 for CI builds
grails-test-suite-uber/build.gradle Increased forkEvery from 25 to 50 for CI builds
grails-validation/src/main/groovy/grails/validation/Validateable.groovy Added clearConstraintsMap() static method to support test isolation
grails-web-url-mappings/src/test/groovy/org/grails/web/mapping/LinkGeneratorWithUrlMappingsSpec.groovy Added setup/cleanup methods to reset RequestContextHolder and link map state
grails-test-suite-web/src/test/groovy/org/grails/web/commandobjects/CommandObjectsSpec.groovy Added setup method to clear constraints cache before each test
grails-test-suite-web/src/test/groovy/org/grails/web/commandobjects/CommandObjectNoDataSpec.groovy Added setup method to clear constraints cache before each test
grails-test-suite-web/src/test/groovy/org/grails/web/mime/WithFormatContentTypeSpec.groovy Added setup method to clear mime types cache and set Holders.grailsApplication
grails-test-suite-web/src/test/groovy/org/grails/web/mime/ContentFormatControllerTests.groovy Added setup method to clear mime types cache
grails-test-suite-web/src/test/groovy/org/grails/web/controllers/ContentNegotiationSpec.groovy Added setup method to clear mime types cache
grails-test-suite-web/src/test/groovy/grails/rest/web/RespondMethodSpec.groovy Added setup method to clear mime types cache
grails-test-suite-web/src/test/groovy/org/grails/web/binding/JSONBindingToNullSpec.groovy Added setup method to clear mime types cache
grails-datamapping-core-test/src/test/groovy/grails/gorm/services/multitenancy/database/DatabasePerTenantSpec.groovy Added setup method to clear tenant ID system property
grails-forge/gradle.properties Updated Groovy version from 3.0.24 to 3.0.25
grails-data-neo4j/gradle.properties Updated Groovy version from 3.0.20 to 3.0.25
grails-data-graphql/gradle.properties Updated Groovy version from 3.0.11 to 3.0.25
grails-forge/grails-forge-core/src/main/java/org/grails/forge/feature/build/gradle/templates/gradleWrapperProperties.rocker.raw Updated Gradle wrapper from 8.14.3 to 8.14.4

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@jamesfredley
Copy link
Contributor Author

jamesfredley commented Jan 24, 2026

Looks pretty good compared to recent runs, although there is significant variability

image image

vs

image image image

@jamesfredley
Copy link
Contributor Author

This ended up being more of a test cleanup than a speed improvement PR.

@matrei
Copy link
Contributor

matrei commented Jan 25, 2026

Could we leverage https://spockframework.org/spock/docs/2.4/all_in_one.html#_restoresystemproperties instead of clearing system properties explicitly?

Copy link
Contributor

@matrei matrei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good stuff, thank you @jamesfredley

@jamesfredley
Copy link
Contributor Author

jamesfredley commented Jan 25, 2026

@RestoreSystemProperties will replace all the system properties changes.

@jdaugherty jdaugherty changed the title Optimize CI workflow to reduce build time Flaky Build Test Cleanup Jan 25, 2026
Copy link
Contributor

@jdaugherty jdaugherty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main concern is with the removal of rerun-tasks, we won't always reapply our AST transformers. We experienced this in the 7.x development and a lot of time was lost trying to figure out when this broke. I think this can easily be solved with a build without caching that runs weekly or something.

Thank you for doing all of the test cleanup- it's been sorely needed.

Add @isolated annotation and doWithConfig() to CommandObjectsSpec to ensure
shared constraint tests run in isolation from parallel tests.

Changes:
- Add @isolated annotation to prevent parallel execution conflicts
- Add doWithConfig() to register the 'isProg' shared constraint
- Simplify setup/cleanup to only clear Validateable constraint cache
- Remove redundant ConstraintEvalUtils.clearDefaultConstraints() calls
  (already handled by @isolated preventing parallel cache pollution)
…l execution

Add setup/cleanup methods to clear URL mappings artefacts, preventing
pollution from parallel tests that register their own URL mappings.
…ents

- Removed @isolated from CommandObjectsSpec and CommandObjectNoDataSpec as it
  only affects Spock's internal parallelism, not Gradle fork-based test isolation
- Updated comments across 29 files to clarify that fixes address test isolation
  (shared static state between tests in the same fork), not parallel execution
- Clarified that tests run sequentially within each fork but static state persists
When mockArtefact() creates a new grailsUrlMappingsHolder bean, the
existing linkGenerator bean still holds a stale reference to the old
holder. This causes URL generation to fail in parallel test execution
when other tests run between test methods and the urlMappingsHolder
bean gets recreated.

The fix updates linkGenerator.urlMappingsHolder to point to the newly
created holder bean after defineBeans completes.
…rence

Similar to the mockArtefact fix, when FormTagLibTests.setup() creates
a new grailsUrlMappingsHolder bean via defineBeans, the linkGenerator
still holds a stale reference. This adds the same refresh logic to
ensure URL generation works correctly in parallel test execution.
Add proper setup/cleanup for tests using Validateable trait and MIME types
to prevent shared static cache issues during parallel test execution:

Constraint cache fixes (clear ConstraintEvalUtils.defaultConstraintsMap and
Validateable.constraintsMapInternal):
- CommandObjectConstraintGettersSpec
- SerializableValidateableSpec
- ValidateableMockSpec
- DefaultASTValidateableHelperSpec
- JsonApiSpec
- CommandObjectNullabilitySpec
- ControllerUnitTestMixinTests
- JsonBindingWithExceptionHandlerSpec
- DefaultASTDatabindingHelperDomainClassSpecialPropertiesSpec
- GrailsTypeCheckedCompilationErrorsSpec
- GrailsCompileStaticCompilationErrorsSpec
- EntityTransformTests

MIME type cache fixes (clear HttpServletResponseExtension.@mimetypes):
- RestfulControllerSpec
- WithFormatSpec
- DefaultXmlRendererSpec
- VndErrorRenderingSpec
- HtmlRendererSpec
- DefaultRendererRegistrySpec
- AbstractRequestBodyDataBindingSourceCreatorSpec
- CollectionBindDataMethodSpec
Use @RestoreSystemProperties annotation from Spock to automatically
restore system properties after tests, fixing test isolation issues
during parallel test execution. For JUnit tests, use proper try-finally
with original value preservation.

Tests fixed:
- grails-core: NavigableMapSpringProfilesSpec, YamlPropertySourceLoaderSpec,
  GrailsPlaceHolderConfigurerCorePluginRuntimeSpec, DevelopmentModeWatchSpec,
  GrailsArtefactTransformerSpec, GrailsASTUtilsTests, GrailsPluginTests
- grails-shell-cli: CreateAppCommandSpec
- grails-test-examples: DatabasePerTenantSpec, PartitionedMultiTenancySpec,
  SchemaPerTenantSpec, BookSpec
- grails-data-neo4j: Neo4jAutoConfigurationSpec, SingleTenancySpec
- grails-data-mongodb: MongoDbGormAutoConfigurationSpec,
  MongoDbGormAutoConfigureWithGeoSpacialSpec
- grails-web-databinding: AbstractRequestBodyDataBindingSourceCreatorSpec
  (removed erroneous mimeTypes cleanup from wrong module)
…ed before clearing caches

The setup() method was clearing ConstraintEvalUtils.defaultConstraintsMap and the
Artist class's constraintsMapInternal before grailsApplication was initialized.
This meant when artist.validate() was called, Holders.grailsApplication was null
or from a previous test, causing the 'isProg' shared constraint to not be found.

Fix: Access 'config' property first in setup() to trigger lazy initialization of
grailsApplication, which calls doWithConfig() to register the shared constraint.
Then clear the caches so they'll be repopulated from the fresh config.
…re clearing cache

Same fix as the shared constraint issue: access 'config' in setup() before clearing
HttpServletResponseExtension.@mimetypes to ensure doWithConfig() has been called
and the custom MIME types are registered in Holders.grailsApplication.

This fixes macOS build failures for:
- WithFormatContentTypeSpec
- RespondMethodSpec
- JSONBindingToNullSpec
- ContentNegotiationSpec
…initialized before accessing artefacts

Same pattern as other fixes: access 'config' in setup() to ensure
grailsApplication is properly initialized before clearing artefact info.
@jamesfredley
Copy link
Contributor Author

all the version updates were moved out to a separate PR and merged, since they were missed when we updated Gradle, Groovy and Java versions.

…is set before tests

The tests expect development environment configuration values but would fail
when the cached environment was set to 'test' from previous test runs.
Replaces the --configuration-cache-problems=warn flag with --rerun-tasks in multiple Gradle build steps to ensure all tasks are rerun. Adds a new buildRerunTasks job for building Grails-Core with rerun tasks enabled. This change aims to improve build reliability and consistency in CI.
Simplified the comment explaining the retrieval of context from GrailsWebRequest in the getMimeTypes method, removing details about test isolation and servlet context pollution.
Eliminated the Develocity testDistribution block from test-config.gradle, simplifying test configuration and removing CI-specific parallel execution settings.
Reworded comments in various test files to use 'prevent test environment pollution' instead of 'ensure test isolation' when clearing static caches or artefacts. This improves clarity on the intent behind these cleanup steps.
@jamesfredley jamesfredley merged commit 3ee3b54 into 7.0.x Jan 28, 2026
54 of 56 checks passed
@jamesfredley jamesfredley deleted the improve-gh-ci-speed branch January 28, 2026 17:20
@jdaugherty
Copy link
Contributor

This fixed #14319

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants