-
Notifications
You must be signed in to change notification settings - Fork 97
TRT-2469: Read/write job labels during fetchdata #3203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
TRT-2469: Read/write job labels during fetchdata #3203
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: automatic mode |
|
@sosiouxme: This pull request references TRT-2469 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
WalkthroughReworked prow job run processing: removed direct GCS-driven cluster data unmarshalling, added BigQuery-backed label gathering ( Changes
Sequence DiagramsequenceDiagram
participant Caller
participant ProwLoader as Prow Loader
participant GCS as GCS Bucket
participant BQ as BigQuery
participant DB as Database
Caller->>ProwLoader: prowJobToJobRun(pj, release)
alt Job Run Already Processed (cache)
ProwLoader-->>Caller: return
else New Run
ProwLoader->>ProwLoader: createOrUpdateProwJob(pj)
ProwLoader->>DB: Create/Update ProwJob
DB-->>ProwLoader: dbProwJob
ProwLoader->>GCS: fetch junit / artifacts
GCS-->>ProwLoader: test data, buildID, startTime
ProwLoader->>BQ: GatherLabelsFromBQ(buildID, startTime)
BQ-->>ProwLoader: labels array
ProwLoader->>DB: Create ProwJobRun (with Labels)
ProwLoader->>DB: Batch insert test rows
DB-->>ProwLoader: insert confirmations
ProwLoader-->>Caller: done
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Important Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional. ❌ Failed checks (1 error, 3 warnings)
✅ Passed checks (3 passed)
✨ Finishing touches
🧹 Recent nitpick comments
📜 Recent review detailsConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Cache: Disabled due to data retention organization setting Knowledge base: Disabled due to data retention organization setting 📒 Files selected for processing (1)
🧰 Additional context used🧬 Code graph analysis (1)pkg/dataloader/prowloader/prow.go (1)
🔇 Additional comments (7)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.5.0)Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions Comment |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: sosiouxme The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@sosiouxme: This pull request references TRT-2469 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @pkg/dataloader/prowloader/prow.go:
- Around line 892-895: The call to gatherLabelsFromBQ currently aborts
processing by returning the error; instead, catch the error, log it via the
prowloader's logger on pl (e.g., pl.logger or pl.log) with context that label
gathering failed, then set labels = []string{} and continue processing so a
transient/missing BQ issue won't fail the entire import; update the block around
gatherLabelsFromBQ(pj) to log the error and continue rather than return it.
🧹 Nitpick comments (1)
pkg/dataloader/prowloader/prow.go (1)
927-927: RemoveDebug()call before merging.The
.Debug()call on GORM will log all SQL queries, which can be noisy in production. Consider removing it or gating it behind a debug flag.Proposed fix
- err = pl.dbc.DB.WithContext(ctx).Debug().CreateInBatches(tests, 1000).Error + err = pl.dbc.DB.WithContext(ctx).CreateInBatches(tests, 1000).Error
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to data retention organization setting
📒 Files selected for processing (1)
pkg/dataloader/prowloader/prow.go
🧰 Additional context used
🧬 Code graph analysis (1)
pkg/dataloader/prowloader/prow.go (3)
pkg/db/models/prow.go (2)
ProwJob(15-26)ProwJobRun(36-66)pkg/db/db.go (1)
DB(27-33)pkg/apis/sippyprocessing/v1/types.go (1)
JobSucceeded(72-72)
🔇 Additional comments (6)
pkg/dataloader/prowloader/prow.go (6)
10-10: LGTM!The new imports for
os(environment variable access) andgithub.com/lib/pq(PostgreSQL array types) are correctly added and necessary for the new label-gathering functionality.Also applies to: 23-23
597-606: LGTM!The refactored pending comments handling is cleaner with proper error handling for
gorm.ErrRecordNotFound.
822-842: LGTM!The refactored flow correctly separates concerns:
- ProwJob creation/update with proper lock handling
- Cache check with read lock
- GCS bucket processing delegation
The lock release before error checking is the correct pattern here.
845-882: LGTM!Clean extraction of the ProwJob creation/update logic into a reusable helper. The return signature
(*models.ProwJob, error)properly enables the caller to get the created/updated job for further use.
1040-1089: LGTM!The label gathering implementation is well-structured:
- Graceful handling when BigQuery client is unavailable (returns nil, nil)
- Safe query construction - dataset and table name are server-controlled, not user input
- Parameterized queries for user-facing values (BuildID, StartTime)
- Correct BigQuery iterator pattern with
iterator.DonecheckThe
DATE(prowjob_start)filter is a good practice for BigQuery partition pruning.
1052-1055: Verify the fallback dataset behavior is intentional.When
JOB_LABELS_DATASETis not set, the code falls back topl.bigQueryClient.Dataset. Ensure this is the intended behavior and that thejob_labelstable exists in the default dataset, or document that the env var must be set for label gathering to work.
pkg/dataloader/prowloader/prow.go
Outdated
| labels, err := pl.gatherLabelsFromBQ(pj) | ||
| if err != nil { | ||
| return err | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, let's look at the actual code around lines 892-895
cat -n pkg/dataloader/prowloader/prow.go | sed -n '884,932p'Repository: openshift/sippy
Length of output: 1855
🏁 Script executed:
# Let's also find and examine the gatherLabelsFromBQ function
ast-grep --pattern 'func ($_ *$_) gatherLabelsFromBQ($_) {
$$$
}'Repository: openshift/sippy
Length of output: 41
🏁 Script executed:
# Search for how labels are used and handled throughout the codebase
rg "gatherLabelsFromBQ|job_labels" -A 3 -B 3 --no-headingRepository: openshift/sippy
Length of output: 5589
🏁 Script executed:
# Find the full implementation of gatherLabelsFromBQ
rg "func.*gatherLabelsFromBQ" -A 30 --no-headingRepository: openshift/sippy
Length of output: 1968
🏁 Script executed:
# Check how labels are defined in the ProwJobRun model
rg "type ProwJobRun struct" -A 50 | grep -A 50 "type ProwJobRun"Repository: openshift/sippy
Length of output: 3243
🏁 Script executed:
# Search for other places where gatherLabelsFromBQ might be called or tested
rg "gatherLabelsFromBQ" -B 2 -A 2Repository: openshift/sippy
Length of output: 927
🏁 Script executed:
# Get the complete gatherLabelsFromBQ function implementation
rg "func.*gatherLabelsFromBQ" -A 50 | head -70Repository: openshift/sippy
Length of output: 3218
🏁 Script executed:
# Check if there are any tests that exercise error cases for gatherLabelsFromBQ
rg "gatherLabelsFromBQ|TestGatherLabels" -B 2 -A 5Repository: openshift/sippy
Length of output: 1222
🏁 Script executed:
# Look for documentation or comments about label criticality
rg "Labels|job_labels" -B 1 -A 1 | grep -A 2 -B 2 "optional\|required\|critical\|essential"Repository: openshift/sippy
Length of output: 50373
Consider whether label gathering failures should halt job run processing.
Currently, if gatherLabelsFromBQ returns an error, the entire job run import fails. Since labels are populated from optional symptom detection or manual annotation and the function already gracefully handles missing BigQuery clients, you should consider logging the error and continuing with an empty labels array instead. This approach would make the import more resilient to transient BigQuery issues or schema differences across environments.
🤖 Prompt for AI Agents
In @pkg/dataloader/prowloader/prow.go around lines 892 - 895, The call to
gatherLabelsFromBQ currently aborts processing by returning the error; instead,
catch the error, log it via the prowloader's logger on pl (e.g., pl.logger or
pl.log) with context that label gathering failed, then set labels = []string{}
and continue processing so a transient/missing BQ issue won't fail the entire
import; update the block around gatherLabelsFromBQ(pj) to log the error and
continue rather than return it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's what I want. If something is transiently wrong with querying BQ, I would like to fail the run and try again the next time the cron job runs; otherwise the job import will be considered complete and will never get the labels. But this is a useful comment for a reviewer to consider whether I have it right here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sosiouxme That makes sense! Your reasoning is correct: given the caching mechanism in prowJobRunCache, once a job run ID is recorded as processed, it won't be re-attempted on subsequent cron executions.
The current implementation aligns with your goal:
- If
gatherLabelsFromBQfails → error is returned - Job run is NOT inserted into the database
- Job run is NOT added to
prowJobRunCache - Next cron run will retry the same job run, including label gathering
This prioritizes data completeness over resilience to transient errors, which is a valid design choice for this use case where labels are an integral part of the job run record. The trade-off is that a prolonged BigQuery outage would block job run imports, but that seems acceptable given labels are meant to be part of the permanent record.
|
Scheduling required tests: |
prepare for expanding the loader to handle job labels. from too-long method prowJobToJobRun break out methods createOrUpdateProwJob and processGCSBucketJobRun. remove GetClusterData which has no evident caller in sippy or cloud functions. other minor simplifications.
🤖 Assisted by Claude Code
8c08b21 to
c17f208
Compare
|
@sosiouxme: This pull request references TRT-2469 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@sosiouxme: This pull request references TRT-2469 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@sosiouxme: This pull request references TRT-2469 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Scheduling required tests: |
|
@sosiouxme: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
During the fetchdata CronJob, read the labels applied in BQ and include them in the job run "labels" array entry. This imports job labels with sippy job runs so they will at least show up on the job runs page.
The first commit is a minor refactor that should change nothing but style. The second is the commit to focus on.
Summary by CodeRabbit
New Features
Refactor
✏️ Tip: You can customize this high-level summary in your review settings.