-
-
Notifications
You must be signed in to change notification settings - Fork 970
Flaky Build Test Cleanup #15335
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flaky Build Test Cleanup #15335
Conversation
e3b3471 to
2e51260
Compare
|
In my early testing, using the cache was not always consistent because of how some of our dependencies are included. Part of the reason we force a rerun of tests was to ensure everything was functional. I am ok disabling the cache for every commit, but I think we still need to run all of the tests periodically. I would strongly urge we fix the reason the develocity plugin retries the tests first - sometimes it reruns all of the tests 3+ times. Most of the tests that fail are in grails-gsp. If you checkout the last build, look at flaky tests to find the problematic ones. I suspect if we fixed these, the github build would go down by over half of its run time. |
|
Oh and this scenario too: changing something in core might only be tested by the end to end test in forge, but forge doesn't depend on grails so gradle would just not run those tests because the cache would see nothing changed. We had lots of issues earlier on in 7.0 dev that were only caught by the functional tests and gradle would not run them always because the cache would not trigger a change. This was a reoccurring theme that I ended up wasting a lot of time chasing in 7.0 dev. |
db012ea to
2fcf65d
Compare
|
Apparently Still experiments to see if anything reduces this from an hour, before proposing changes. |
cadbfbf to
116419c
Compare
bfe199a to
3478780
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR optimizes the CI workflow to reduce build times and fixes multiple flaky tests caused by static state pollution during parallel test execution.
Changes:
- Optimized CI workflow by replacing
--rerun-taskswith selective test cache disabling, adding configuration cache warnings, and increasing parallel execution settings - Reduced CI matrix combinations to test specific OS/Java pairings instead of full cross-products
- Fixed flaky tests by adding proper cleanup of ThreadLocal state, static constraints caches, mime type caches, and tenant ID system properties in test setup methods
- Added
clearConstraintsMap()method to theValidateabletrait to support constraint cache clearing in tests - Updated Groovy version to 3.0.25 and Gradle wrapper to 8.14.4 for Java 25 compatibility
Reviewed changes
Copilot reviewed 21 out of 22 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
.github/workflows/gradle.yml |
Optimized CI workflow by changing Java matrix combinations, removing --rerun-tasks and cleanTest, adding --configuration-cache-problems=warn |
build.gradle |
Increased configuredTestParallel from 3 to 4 for CI builds to match GitHub runner cores |
gradle/test-config.gradle |
Added test cache disabling for CI builds, enabled test distribution, increased forkEvery from 20 to 50 |
grails-test-suite-web/build.gradle |
Increased forkEvery from 10 to 50 for CI builds |
grails-test-suite-uber/build.gradle |
Increased forkEvery from 25 to 50 for CI builds |
grails-validation/src/main/groovy/grails/validation/Validateable.groovy |
Added clearConstraintsMap() static method to support test isolation |
grails-web-url-mappings/src/test/groovy/org/grails/web/mapping/LinkGeneratorWithUrlMappingsSpec.groovy |
Added setup/cleanup methods to reset RequestContextHolder and link map state |
grails-test-suite-web/src/test/groovy/org/grails/web/commandobjects/CommandObjectsSpec.groovy |
Added setup method to clear constraints cache before each test |
grails-test-suite-web/src/test/groovy/org/grails/web/commandobjects/CommandObjectNoDataSpec.groovy |
Added setup method to clear constraints cache before each test |
grails-test-suite-web/src/test/groovy/org/grails/web/mime/WithFormatContentTypeSpec.groovy |
Added setup method to clear mime types cache and set Holders.grailsApplication |
grails-test-suite-web/src/test/groovy/org/grails/web/mime/ContentFormatControllerTests.groovy |
Added setup method to clear mime types cache |
grails-test-suite-web/src/test/groovy/org/grails/web/controllers/ContentNegotiationSpec.groovy |
Added setup method to clear mime types cache |
grails-test-suite-web/src/test/groovy/grails/rest/web/RespondMethodSpec.groovy |
Added setup method to clear mime types cache |
grails-test-suite-web/src/test/groovy/org/grails/web/binding/JSONBindingToNullSpec.groovy |
Added setup method to clear mime types cache |
grails-datamapping-core-test/src/test/groovy/grails/gorm/services/multitenancy/database/DatabasePerTenantSpec.groovy |
Added setup method to clear tenant ID system property |
grails-forge/gradle.properties |
Updated Groovy version from 3.0.24 to 3.0.25 |
grails-data-neo4j/gradle.properties |
Updated Groovy version from 3.0.20 to 3.0.25 |
grails-data-graphql/gradle.properties |
Updated Groovy version from 3.0.11 to 3.0.25 |
grails-forge/grails-forge-core/src/main/java/org/grails/forge/feature/build/gradle/templates/gradleWrapperProperties.rocker.raw |
Updated Gradle wrapper from 8.14.3 to 8.14.4 |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
3478780 to
6e2ffab
Compare
|
This ended up being more of a test cleanup than a speed improvement PR. |
4e173ea to
58689af
Compare
|
Could we leverage https://spockframework.org/spock/docs/2.4/all_in_one.html#_restoresystemproperties instead of clearing system properties explicitly? |
matrei
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good stuff, thank you @jamesfredley
grails-mimetypes/src/test/groovy/org/grails/plugins/web/mime/MimeTypesConfigurationSpec.groovy
Show resolved
Hide resolved
grails-mimetypes/src/test/groovy/org/grails/web/mime/AcceptHeaderParserSpec.groovy
Show resolved
Hide resolved
grails-mimetypes/src/main/groovy/org/grails/web/mime/HttpServletRequestExtension.groovy
Outdated
Show resolved
Hide resolved
...metypes/src/test/groovy/org/grails/web/servlet/mvc/RequestAndResponseMimeTypesApiSpec.groovy
Show resolved
Hide resolved
...-test-suite-web/src/test/groovy/org/grails/web/commandobjects/CommandObjectNoDataSpec.groovy
Show resolved
Hide resolved
grails-test-suite-web/src/test/groovy/org/grails/web/mime/ContentFormatControllerTests.groovy
Show resolved
Hide resolved
grails-validation/src/main/groovy/grails/validation/Validateable.groovy
Outdated
Show resolved
Hide resolved
grails-validation/src/test/groovy/grails/validation/ValidateableTraitAdHocSpec.groovy
Outdated
Show resolved
Hide resolved
grails-validation/src/test/groovy/grails/validation/ValidateableTraitSpec.groovy
Outdated
Show resolved
Hide resolved
|
@RestoreSystemProperties will replace all the system properties changes. |
grails-validation/src/main/groovy/grails/validation/Validateable.groovy
Outdated
Show resolved
Hide resolved
jdaugherty
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main concern is with the removal of rerun-tasks, we won't always reapply our AST transformers. We experienced this in the 7.x development and a lot of time was lost trying to figure out when this broke. I think this can easily be solved with a build without caching that runs weekly or something.
Thank you for doing all of the test cleanup- it's been sorely needed.
Add @isolated annotation and doWithConfig() to CommandObjectsSpec to ensure shared constraint tests run in isolation from parallel tests. Changes: - Add @isolated annotation to prevent parallel execution conflicts - Add doWithConfig() to register the 'isProg' shared constraint - Simplify setup/cleanup to only clear Validateable constraint cache - Remove redundant ConstraintEvalUtils.clearDefaultConstraints() calls (already handled by @isolated preventing parallel cache pollution)
…l execution Add setup/cleanup methods to clear URL mappings artefacts, preventing pollution from parallel tests that register their own URL mappings.
…ents - Removed @isolated from CommandObjectsSpec and CommandObjectNoDataSpec as it only affects Spock's internal parallelism, not Gradle fork-based test isolation - Updated comments across 29 files to clarify that fixes address test isolation (shared static state between tests in the same fork), not parallel execution - Clarified that tests run sequentially within each fork but static state persists
…intsMap in setup/cleanup
When mockArtefact() creates a new grailsUrlMappingsHolder bean, the existing linkGenerator bean still holds a stale reference to the old holder. This causes URL generation to fail in parallel test execution when other tests run between test methods and the urlMappingsHolder bean gets recreated. The fix updates linkGenerator.urlMappingsHolder to point to the newly created holder bean after defineBeans completes.
…rence Similar to the mockArtefact fix, when FormTagLibTests.setup() creates a new grailsUrlMappingsHolder bean via defineBeans, the linkGenerator still holds a stale reference. This adds the same refresh logic to ensure URL generation works correctly in parallel test execution.
Add proper setup/cleanup for tests using Validateable trait and MIME types to prevent shared static cache issues during parallel test execution: Constraint cache fixes (clear ConstraintEvalUtils.defaultConstraintsMap and Validateable.constraintsMapInternal): - CommandObjectConstraintGettersSpec - SerializableValidateableSpec - ValidateableMockSpec - DefaultASTValidateableHelperSpec - JsonApiSpec - CommandObjectNullabilitySpec - ControllerUnitTestMixinTests - JsonBindingWithExceptionHandlerSpec - DefaultASTDatabindingHelperDomainClassSpecialPropertiesSpec - GrailsTypeCheckedCompilationErrorsSpec - GrailsCompileStaticCompilationErrorsSpec - EntityTransformTests MIME type cache fixes (clear HttpServletResponseExtension.@mimetypes): - RestfulControllerSpec - WithFormatSpec - DefaultXmlRendererSpec - VndErrorRenderingSpec - HtmlRendererSpec - DefaultRendererRegistrySpec - AbstractRequestBodyDataBindingSourceCreatorSpec - CollectionBindDataMethodSpec
Use @RestoreSystemProperties annotation from Spock to automatically restore system properties after tests, fixing test isolation issues during parallel test execution. For JUnit tests, use proper try-finally with original value preservation. Tests fixed: - grails-core: NavigableMapSpringProfilesSpec, YamlPropertySourceLoaderSpec, GrailsPlaceHolderConfigurerCorePluginRuntimeSpec, DevelopmentModeWatchSpec, GrailsArtefactTransformerSpec, GrailsASTUtilsTests, GrailsPluginTests - grails-shell-cli: CreateAppCommandSpec - grails-test-examples: DatabasePerTenantSpec, PartitionedMultiTenancySpec, SchemaPerTenantSpec, BookSpec - grails-data-neo4j: Neo4jAutoConfigurationSpec, SingleTenancySpec - grails-data-mongodb: MongoDbGormAutoConfigurationSpec, MongoDbGormAutoConfigureWithGeoSpacialSpec - grails-web-databinding: AbstractRequestBodyDataBindingSourceCreatorSpec (removed erroneous mimeTypes cleanup from wrong module)
…ed before clearing caches The setup() method was clearing ConstraintEvalUtils.defaultConstraintsMap and the Artist class's constraintsMapInternal before grailsApplication was initialized. This meant when artist.validate() was called, Holders.grailsApplication was null or from a previous test, causing the 'isProg' shared constraint to not be found. Fix: Access 'config' property first in setup() to trigger lazy initialization of grailsApplication, which calls doWithConfig() to register the shared constraint. Then clear the caches so they'll be repopulated from the fresh config.
…re clearing cache Same fix as the shared constraint issue: access 'config' in setup() before clearing HttpServletResponseExtension.@mimetypes to ensure doWithConfig() has been called and the custom MIME types are registered in Holders.grailsApplication. This fixes macOS build failures for: - WithFormatContentTypeSpec - RespondMethodSpec - JSONBindingToNullSpec - ContentNegotiationSpec
…initialized before accessing artefacts Same pattern as other fixes: access 'config' in setup() to ensure grailsApplication is properly initialized before clearing artefact info.
7cd5005 to
76a8df6
Compare
|
all the version updates were moved out to a separate PR and merged, since they were missed when we updated Gradle, Groovy and Java versions. |
…is set before tests The tests expect development environment configuration values but would fail when the cached environment was set to 'test' from previous test runs.
Replaces the --configuration-cache-problems=warn flag with --rerun-tasks in multiple Gradle build steps to ensure all tasks are rerun. Adds a new buildRerunTasks job for building Grails-Core with rerun tasks enabled. This change aims to improve build reliability and consistency in CI.
Simplified the comment explaining the retrieval of context from GrailsWebRequest in the getMimeTypes method, removing details about test isolation and servlet context pollution.
Eliminated the Develocity testDistribution block from test-config.gradle, simplifying test configuration and removing CI-specific parallel execution settings.
Reworded comments in various test files to use 'prevent test environment pollution' instead of 'ensure test isolation' when clearing static caches or artefacts. This improves clarity on the intent behind these cleanup steps.
|
This fixed #14319 |





Summary
This PR optimizes CI workflow performance, fixes multiple flaky tests caused by static state pollution during parallel test execution, and adds developer experience improvements. After fixing the root causes of test flakiness, Develocity test retries have been removed.
CI Workflow Optimizations
Build Performance
--rerun-tasksflag for reliable CI buildsoutputs.cacheIf { !isCiBuild }GroovyCompiletasks) to ensure AST transformations are always appliedmaxParallelForksfrom 3 to 4 (matches GitHub runner cores)forkEveryfrom 10-25 to 50 (reduces JVM restart overhead)maxParallelForks = 1restriction from grails-gsp modulecancel-in-progressfor PR workflow runs - Cancels outdated workflow runs when new commits are pushed to a PR across all CI workflows (gradle.yml,codestyle.yml,codeql.yml,groovy-joint-workflow.yml,rat.yml)buildRerunTasksjob for building Grails-Core with rerun tasks enabledCI Matrix Reduction
buildGradle,functional,mongodbFunctional,hibernate5Functional): Changed from[17, 21]to[17, 25][17, 21]Test Retry Removal
After fixing the flaky tests, Develocity test retries have been removed from:
gradle/test-config.gradle- Removed testRetry blockbuild-logic/docs-core/build.gradle- Removed develocity testRetry blockgrails-data-hibernate5/dbmigration/build.gradle- Removed entire testRetry task configurationgrails-gradle/gradle/test-config.gradle- Removed develocity testRetry blockgrails-forge/buildSrc/build.gradle- Removed unused test-retry plugin dependencygrails-forge/gradle/test-config.gradle- Removed commented-out retry code and plugin applyFlaky Test Fixes
Root Cause Analysis
When tests run in parallel (
maxParallelForks > 1), static state can leak between tests causing intermittent failures. The main culprits identified:HttpServletResponseExtension.@mimeTypes- Static cache initialized by one test used by anotherHttpServletResponseExtension.getMimeTypes()- Was callingwebRequest.applicationContextwithout null check, causing NPE whenGrailsWebRequest.lookup()returns null during parallel test runsHttpServletRequestExtension.getMimeTypes()- Was using servlet context attributes which get polluted by parallel tests due toGrailsUnitTeststatic fieldsHoldersstate -grailsApplication,pluginManagercleared mid-test by parallel cleanupValidateable.constraintsMapInternal- Constraint cache evaluated beforedoWithConfig()registers shared constraintsConstraintEvalUtils.defaultConstraintsMap- Global shared constraints cache keyed by config identity hashSystemPropertyTenantResolver- System propertygorm.tenantIdpollution between multi-tenancy testsRequestContextHolder- ThreadLocal request attributes not properly resetGrailsApplication.artefactInfo- URL mapping artefacts polluting parallel testsLinkGenerator.urlMappingsHolder- Stale reference to old URL mappings holder after bean recreationKey Fix Pattern: Holders.clear() in Cleanup
Problem: Calling
Holders.clear()incleanup()or@AfterEachclears ALL Holders state (includinggrailsApplication) which can affect parallel tests that depend onHolders.grailsApplicationfor mime type resolution.Solution:
cleanupSpec()/@AfterAllwhich runs after all tests in the class completeGrailsUnitTesttrait already handlesHolders.clear()incleanupClass()Key Fix Pattern: Ensure grailsApplication Initialization Before Clearing Caches
Problem: Clearing static caches in
setup()beforegrailsApplicationis initialized causesdoWithConfig()to not be called, meaning custom MIME types and shared constraints are not registered.Solution: Access the
configproperty first insetup()to trigger lazy initialization ofgrailsApplication, which callsdoWithConfig(). Then clear the caches so they will be repopulated from the fresh config.Fix Categories
1. MIME Type Lookup Isolation (Production Code Fixes)
HttpServletResponseExtension.getMimeTypes()was callingwebRequest.applicationContextwithout checking ifwebRequestis null. WhenGrailsWebRequest.lookup()returns null during parallel test execution, this caused a NullPointerException.Fix: Added null-safe operator (
?.) when accessingwebRequest.applicationContext.HttpServletRequestExtension.getMimeTypes()was usingWebApplicationContextUtils.getWebApplicationContext(request.servletContext)to look up MIME types. This fails in parallel tests becauseGrailsUnitTestuses static_servletContextwhich gets polluted across tests.Fix: Now prioritizes
GrailsWebRequest.lookup()(thread-local) before falling back to servlet context lookup.2. Mime Type Cache Isolation (~20 files)
Added
setup()andcleanup()methods to clearHttpServletResponseExtension.@mimeTypesbefore/after each test, ensuringgrailsApplicationis initialized first:RespondMethodSpec,WithFormatContentTypeSpec,ContentFormatControllerTestsContentNegotiationSpec,JSONBindingToNullSpec,MimeUtilitySpecMimeTypesConfigurationSpec,AcceptHeaderParserSpec,RequestAndResponseMimeTypesApiSpecHalJsonRendererSpec,JsonRendererSpec,BaseDomainClassRendererSpecRestfulControllerSpec,WithFormatSpec,DefaultXmlRendererSpecVndErrorRenderingSpec,HtmlRendererSpec,DefaultRendererRegistrySpec3. Multi-Tenancy Test Isolation (~18 files)
Used Spock
@RestoreSystemPropertiesannotation for cleaner system property isolation, combined with clearingSystemPropertyTenantResolver:DatabasePerTenantSpec,SchemaPerTenantSpec,PartitionMultiTenancySpecMultiTenancySpec,SingleTenantSpec,SchemaMultiTenantSpecMongoStaticApiMultiTenancySpec,SchemaBasedMultiTenancySpec,SingleTenancySpec(MongoDB)MongoConnectionSourcesSpec,MultiTenancySpec(MongoDB)CurrentTenantTransformSpec,TenantServiceSpec,MultiTenantServiceTransformSpec4. Command Object Constraint Isolation (~15 files)
Tests that use shared constraints via
doWithConfig()need proper cache clearing. Uses reflection to clear the privateconstraintsMapInternalfield in the Validateable trait:CommandObjectsSpec.groovy,CommandObjectNoDataSpec.groovyValidateableTraitSpec.groovy,ValidateableTraitAdHocSpec.groovyCommandObjectConstraintGettersSpec,SerializableValidateableSpecValidateableMockSpec,DefaultASTValidateableHelperSpecJsonApiSpec,CommandObjectNullabilitySpecControllerUnitTestMixinTests,JsonBindingWithExceptionHandlerSpecNote: A public API method
clearConstraintsMapCache()will be added in Grails 7.1 (see PR #15346). This PR uses reflection to access the Groovy 4 trait static field helper interface to clear the cache without modifying the public API, which would be a breaking change in a patch release.5. Holders.clear() Fixes (4 files)
Replaced aggressive
Holders.clear()with targeted cleanup:WithFormatContentTypeSpec- RemovedHolders.clear()fromcleanup()BeanBuilderTests- Changed toHolders.setPluginManager null(only reset what setUp sets)DomainClassTraitSpec- MovedHolders.clear()fromcleanup()tocleanupSpec()6. RequestContextHolder Cleanup (~10 files)
Added
RequestContextHolder.resetRequestAttributes()in setup/cleanup:LinkGeneratorWithUrlMappingsSpec,EncodePathFromURISpecHTMLJSCodecIntegrationSpec,DefaultGrailsApplicationAttributesTestsFlashScopeWithErrorsTests,StreamCharBufferSpec,WebUtilsTestsBindingOutsideRequestSpecUrlMappingsWithHttpMethodSpec,RestfulUrlMappingSpec7. URL Mapping Artefact Isolation (~4 files)
Clear URL mapping artefacts and refresh linkGenerator urlMappingsHolder reference:
FormTagLibTests- ClearartefactInfoin setup/cleanup, refresh linkGeneratorAbstractGrailsTagTests- Clear URL mappings intearDown()for all subclassesRestfulReverseUrlRenderingTests- Clear URL mappings and re-register in setupControllerUnitTestMixin.mockArtefact()- Refresh linkGenerator.urlMappingsHolder after bean recreation8. System Property Isolation (~20 files)
Fixed tests to use
@RestoreSystemPropertiesfor proper system property isolation:UrlMappingMatcherSpec,YamlPropertySourceLoaderSpecNavigableMapSpringProfilesSpec,GrailsPlaceHolderConfigurerCorePluginRuntimeSpecDevelopmentModeWatchSpec,GrailsArtefactTransformerSpecCreateAppCommandSpec,BookSpecDependency Updates
grails-data-graphql,grails-data-neo4j,grails-forge(Java 25 compatibility)gradleWrapperProperties.rocker.raw)gradle.properties).sdkmanrcupdates for Java 17.0.18-librcaRelated PRs
clearConstraintsMapCache()public API to Validateable trait (targeting 7.1.x)Files Changed