GDCD script for page counts, remove deprecated project#3
Conversation
| 1. Runs audit-cli once to identify projects that exist only in audit-cli (not in the log) | ||
| 2. Re-runs audit-cli with those projects excluded using the `--exclude-dirs` flag | ||
| 3. Compares the filtered results for a cleaner comparison |
| // projectNameMapping maps log file project names to their audit-cli equivalents. | ||
| // This handles cases where the same project has different names in the GDCD logs | ||
| // versus the audit-cli output. Add new mappings here as needed. | ||
| var projectNameMapping = map[string]string{ |
There was a problem hiding this comment.
ugh another place to custom map names 😑
audit/gdcd/scripts/README.md
Outdated
| - **Only in log**: Projects found in the log but not in audit-cli output (may indicate project name mismatches) | ||
| - **Only in audit-cli**: Projects found in audit-cli but not in the log - these are automatically excluded in the second run for a cleaner comparison |
There was a problem hiding this comment.
i'm confused by the output, i think. will these ever be populated? e.g. i see we have a handful of "only in <log/audit-cli>" entries but these are both 0 in the summary -- would they have values on the first run?
There was a problem hiding this comment.
The way the code is structured, they're populated on the "initial run" and then the tool re-runs the audit-cli with excluded dirs for only in audit-cli entries. At that point, the number is reduced to 0. The "only in log" entries can be populated by new projects that we haven't added naming mapping for (if the project name does not match the name in the audit-cli) but will probably be 0 other than that.
There was a problem hiding this comment.
the example we give shows both types in the results, though, which is why i'm confused. shouldn't the summary only in log reflect the three results that are marked with only in log?
i'm also not really seeing the value of showing the only in audit-cli if it effectively gets reduced to 0 every time
There was a problem hiding this comment.
Yeah, good points. Made some minor tweaks to the way the output is generated to:
- Omit
only in audit-clisince it should never be populated - Only conditionally show
only in logif there are projects that only appear in the log but notaudit-cli - Also check that
audit-cliis available before trying to run the thing
I also updated the example output in the README so hopefully this is all consistent now. 🤞
Co-authored-by: cory <115956901+cbullinger@users.noreply.github.com>
This PR adds a new script to compare the GDCD ingest logs (from Snooty Data API ingest job) to the
audit-clioutput from local monorepo files.In investigating discrepancies, I also discovered that
docs-k8s-operatoris deprecated and we should no longer be ingesting data for it during our weekly ingest job.