-
Notifications
You must be signed in to change notification settings - Fork 8
central detect sensitive files #40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
central detect sensitive files #40
Conversation
Our Pull Request Approval ProcessThis PR will be reviewed according to our: Your PR may be automatically closed if:
Thanks for contributing! |
WalkthroughRefactors the sensitive-file check: moves pattern list into Changes
Sequence Diagram(s)sequenceDiagram
participant Actions as GitHub Actions
participant Git as git
participant Runner as Runner FS
participant Script as sensitive_file_check.py
Actions->>Git: git diff --name-only origin/main...HEAD
Git-->>Actions: list of changed file paths (CHANGED_FILES)
Actions->>Runner: pass CHANGED_FILES + config path
Runner->>Script: execute sensitive_file_check.py --config config/... --files <paths>
Script->>Runner: open/read config file
Script->>Runner: walk filesystem for provided paths
Script->>Script: match files against compiled regex patterns
alt matches found
Script-->>Actions: exit code 1 + printed override instructions
Actions-->>Actions: job fails (script exit enforces failure)
else no matches
Script-->>Actions: exit code 0 (no sensitive files)
Actions-->>Actions: job continues
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🤖 Fix all issues with AI agents
In @.github/workflows/check-sensitive-files.yml:
- Around line 50-68: Add the sensitive-file check as a prerequisite to both PR
workflows by making the existing check workflow reusable and invoking it from
pull-request-review.yml and pull-request-target.yml: update
.github/workflows/check-sensitive-files.yml to support being called (on:
workflow_call) if it isn’t already, then in both pull-request-review.yml and
pull-request-target.yml add a job that uses
./.github/workflows/check-sensitive-files.yml with id sensitive-check (or
preserve the job id "sensitive-check") and make the rest of the PR jobs depend
on it via needs: [sensitive-check] so the "Detect sensitive file changes" job
runs and blocks the PR jobs when it fails.
- Around line 62-68: CHANGED_FILES is built by joining filenames with spaces
which breaks filenames that contain spaces; change the logic to read git diff
output into a bash array (use git diff --name-only --diff-filter=ACMRD
"$BASE_SHA" "$HEAD_SHA" piped into a null/line-safe reader) and then call the
Python checker with the array expanded so each filename is a single argument
(use "${CHANGED_FILES[@]}"); also update the emptiness check to test the array
length (e.g., ${`#CHANGED_FILES`[@]} > 0) before invoking sensitive_file_check.py
with --files to ensure filenames containing spaces are preserved and handled
correctly.
In @.github/workflows/config/sensitive_files.txt:
- Around line 2-23: The regex entries in sensitive_files.txt use unescaped dots
and broad patterns (e.g., ".flake8$", ".pydocstyle$", ".env..*$", "^.github/.*",
"^.husky/.*", "scripts/.*", "src/style/.*", ".gitignore$", "env.example$",
".eslintignore$", ".prettierrc$", ".prettierignore$", and any ".*.pem$" style
entries) which causes over‑matching; update each pattern to escape literal dots
(e.g., use "\." for ".") and ensure anchors are correct (keep leading "^" and
trailing "$" where intended) so entries match exact filenames or paths rather
than any character sequences, and apply this change consistently across all
similar lines mentioned in the diff.
In @.github/workflows/scripts/sensitive_file_check.py:
- Around line 55-66: The code currently normalizes only directory-walked file
paths (using rel_path) causing direct file paths to remain absolute and miss
regex matches; update the loop where you handle os.path.isfile(path) and where
you append full_path/rel_path so that you call os.path.relpath (and optionally
os.path.normpath) on both file and directory entries before appending to
files_to_check; look for symbols files_to_check, paths, os.path.isfile,
os.path.isdir, full_path and rel_path and ensure all appended paths are
consistently normalized relative to os.getcwd().
In @.github/workflows/scripts/test/test_sensitive_file_check.py:
- Around line 51-53: The test patch for os.path.relpath uses an unused lambda
argument named start which causes lint failures; update the mock in the test
(the patch around os.path.relpath used when calling
sensitive_file_check.get_files_to_check) to rename the unused parameter to
_start (or *_), e.g. change lambda p, start: p.replace("/root/", "") to lambda
p, _start: p.replace("/root/", "") so Ruff stops flagging the unused argument.
🧹 Nitpick comments (1)
.github/workflows/scripts/sensitive_file_check.py (1)
26-38: Narrow the broad exception inload_patterns.
CatchingException(Line 36) can mask programming errors and is typically discouraged by linters. PreferOSError/UnicodeErrorfor file I/O and decoding issues.♻️ Proposed fix
- except Exception as error: # pylint: disable=broad-except + except (OSError, UnicodeError) as error: print(f"Error reading configuration file: {error}") sys.exit(1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
.github/workflows/check-sensitive-files.yml (1)
3-10: Add this workflow as a prerequisite job to the PR workflows.The
check-sensitive-filesworkflow is correctly configured as a reusable workflow (workflow_call), but it is not currently called from any PR workflows. According to issue#34, it must be integrated as a prerequisite to bothpull-request-review.ymlandpull-request-target.yml. Add a job that calls this workflow in both files using theusesdirective.
🤖 Fix all issues with AI agents
In @.github/workflows/check-sensitive-files.yml:
- Around line 62-64: Fix the bash array-length test for CHANGED_FILES: replace
the invalid backticked expression used in the conditional (the if test
referencing `${`#CHANGED_FILES`[@]}`) with the proper array-length expansion
`${`#CHANGED_FILES`[@]}` so the if [ ... -gt 0 ] check works; update the
conditional that invokes python3 sensitive_file_check.py accordingly (the symbol
to locate is the CHANGED_FILES array and the surrounding if [ ... ] test).
🧹 Nitpick comments (1)
.github/workflows/config/sensitive_files.txt (1)
1-55: LGTM overall – patterns are properly escaped based on past review feedback.Most patterns now correctly escape literal dots. A few minor inconsistencies remain:
Line 5:
^\.env..*$– the dot before.*should be escaped as^\.env\..*$to match files like.env.localrather than.envXlocal.Line 33:
.coderabbit.yaml$– missing^anchor and unescaped leading dot. Should be^\.coderabbit\.yaml$.Line 54:
^docs/sidebar..*– unescaped dot before.*. Should be^docs/sidebar\..*.Inconsistent anchoring: Some patterns lack the
^start anchor (e.g., lines 6-7, 12-15, 23, 30-32, etc.), which means they'll match anywhere in the path. This may be intentional for flexibility, but worth confirming.💡 Suggested fixes for remaining issues
-^\.env..*$ +^\.env\..*$ -.coderabbit.yaml$ +^\.coderabbit\.yaml$ -^docs/sidebar..* +^docs/sidebar\..*
|
@palisadoes could you please review the changes and let me know if any additional updates are needed? |
|
I investigated the failing issue and noticed that the recommendation was to remove the "+" from the Python scripts as its presence appears to break the existing regex patterns However CodeRabbit suggests adding the "+" back in |
palisadoes
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Revert these files. They must not be changed
.github/workflows/pull-request-review.yml .github/workflows/pull-request-target.yml - Remove these files from the PR. Each repo will either have their own local version or refer to the script in this PR directly.
.github/workflows/config/sensitive_files.txt .github/workflows/check-sensitive-files.yml
palisadoes
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments
| pull_request_review: | ||
| types: [submitted] | ||
| jobs: | ||
| sensitive-check: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revert. This file must be unchanged
|
|
||
| jobs: | ||
| sensitive-check: | ||
| uses: ./.github/workflows/check-sensitive-files.yml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revert. This file must be unchanged
|
|
||
| - name: Set up Python | ||
| if: steps.check-labels.outputs.skip != 'true' | ||
| uses: actions/setup-python@v5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this file
|
|
||
| import argparse | ||
| import os | ||
| import re |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep this file
Add
Separated Code and Config: Created .github/workflows/scripts/sensitive_file_check.py and .github/workflows/config/sensitive_files.txt to decouple the checking logic from the regex patterns.
Updated Workflow: Modified .github/workflows/check-sensitive-files.yml to use the new Python script instead of inline Bash commands.
Added Tests: Implemented unit tests in .github/workflows/scripts/test/test_sensitive_file_check.py to ensure the script works correctly.
Resolves #34
Summary by CodeRabbit
Chores
Tests
✏️ Tip: You can customize this high-level summary in your review settings.