Skip to content

Conversation

@nilanjan-sikdar
Copy link

@nilanjan-sikdar nilanjan-sikdar commented Jan 22, 2026

Add

  1. Separated Code and Config: Created .github/workflows/scripts/sensitive_file_check.py and .github/workflows/config/sensitive_files.txt to decouple the checking logic from the regex patterns.

  2. Updated Workflow: Modified .github/workflows/check-sensitive-files.yml to use the new Python script instead of inline Bash commands.

  3. Added Tests: Implemented unit tests in .github/workflows/scripts/test/test_sensitive_file_check.py to ensure the script works correctly.

Resolves #34

Summary by CodeRabbit

  • Chores

    • CI now runs a Python-based sensitive-file validator driven by a curated pattern list.
    • Replaced large static checks with dynamic changed-file detection; enforcement follows the validator's exit code and retains override messaging.
    • Preserved early exit for non-PR contexts and simplified workflow dependencies.
  • Tests

    • Added unit tests covering pattern loading, file discovery, and sensitive-file detection behavior.

✏️ Tip: You can customize this high-level summary in your review settings.

@github-actions
Copy link

Our Pull Request Approval Process

This PR will be reviewed according to our:

  1. Palisadoes Contributing Guidelines

  2. AI Usage Policy

Your PR may be automatically closed if:

  1. Our PR template isn't filled in correctly
  2. You haven't correctly linked your PR to an issue

👉 https://docs.github.com/en/issues/tracking-your-work-with-issues/using-issues/linking-a-pull-request-to-an-issue

Thanks for contributing!

@coderabbitai
Copy link

coderabbitai bot commented Jan 22, 2026

Walkthrough

Refactors the sensitive-file check: moves pattern list into .github/workflows/config/sensitive_files.txt, adds .github/workflows/scripts/sensitive_file_check.py to validate changed files, and updates workflows to detect changed files via git and invoke the script; script exit code now enforces failures.

Changes

Cohort / File(s) Summary
Workflow
​.github/workflows/check-sensitive-files.yml, ​.github/workflows/pull-request-review.yml, ​.github/workflows/pull-request-target.yml
Replaced inline sensitive-pattern loop with git-based changed-file detection and a single invocation of sensitive_file_check.py; removed per-file GH_OUTPUT emissions and the separate failure step; added job reuse/dependency to include the new sensitive-check job.
Config
​.github/workflows/config/sensitive_files.txt
New canonical list of sensitive file/glob/regex patterns used by the script.
Script
​.github/workflows/scripts/sensitive_file_check.py
New CLI Python utility providing load_patterns(), get_files_to_check(), check_files(), and main() using argparse; loads regex config, expands directories/files, reports matches and exits non‑zero on sensitive matches.
Tests
​.github/workflows/scripts/test/test_sensitive_file_check.py
New unit tests covering pattern loading, path resolution, regex matching, and error handling; uses mocks for filesystem and IO.

Sequence Diagram(s)

sequenceDiagram
  participant Actions as GitHub Actions
  participant Git as git
  participant Runner as Runner FS
  participant Script as sensitive_file_check.py

  Actions->>Git: git diff --name-only origin/main...HEAD
  Git-->>Actions: list of changed file paths (CHANGED_FILES)
  Actions->>Runner: pass CHANGED_FILES + config path
  Runner->>Script: execute sensitive_file_check.py --config config/... --files <paths>
  Script->>Runner: open/read config file
  Script->>Runner: walk filesystem for provided paths
  Script->>Script: match files against compiled regex patterns
  alt matches found
    Script-->>Actions: exit code 1 + printed override instructions
    Actions-->>Actions: job fails (script exit enforces failure)
  else no matches
    Script-->>Actions: exit code 0 (no sensitive files)
    Actions-->>Actions: job continues
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 I hopped through configs and scripts with delight,
Patterns now tidy and kept out of sight.
Git lists the changes, the Python inspects,
Regex finds secrets and kindly directs.
A rabbit-approved check—swift, simple, and bright!

🚥 Pre-merge checks | ✅ 4 | ❌ 1
❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Out of Scope Changes check ❓ Inconclusive Changes to pull-request-review.yml and pull-request-target.yml add workflow dependencies but issue #34 only specifies updating pull-request.yml; unclear if these are necessary or out-of-scope. Clarify whether the pull-request-review.yml and pull-request-target.yml changes are required by issue #34 or if they are scope creep beyond the stated objective.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'central detect sensitive files' accurately summarizes the main change: centralizing sensitive file detection from inline YAML into a dedicated Python script.
Linked Issues check ✅ Passed The PR successfully implements all core requirements from issue #34: creates a Python script with argparse, stores patterns in a config file, supports multiple files/directories, and replaces inline YAML detection.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Fix all issues with AI agents
In @.github/workflows/check-sensitive-files.yml:
- Around line 50-68: Add the sensitive-file check as a prerequisite to both PR
workflows by making the existing check workflow reusable and invoking it from
pull-request-review.yml and pull-request-target.yml: update
.github/workflows/check-sensitive-files.yml to support being called (on:
workflow_call) if it isn’t already, then in both pull-request-review.yml and
pull-request-target.yml add a job that uses
./.github/workflows/check-sensitive-files.yml with id sensitive-check (or
preserve the job id "sensitive-check") and make the rest of the PR jobs depend
on it via needs: [sensitive-check] so the "Detect sensitive file changes" job
runs and blocks the PR jobs when it fails.
- Around line 62-68: CHANGED_FILES is built by joining filenames with spaces
which breaks filenames that contain spaces; change the logic to read git diff
output into a bash array (use git diff --name-only --diff-filter=ACMRD
"$BASE_SHA" "$HEAD_SHA" piped into a null/line-safe reader) and then call the
Python checker with the array expanded so each filename is a single argument
(use "${CHANGED_FILES[@]}"); also update the emptiness check to test the array
length (e.g., ${`#CHANGED_FILES`[@]} > 0) before invoking sensitive_file_check.py
with --files to ensure filenames containing spaces are preserved and handled
correctly.

In @.github/workflows/config/sensitive_files.txt:
- Around line 2-23: The regex entries in sensitive_files.txt use unescaped dots
and broad patterns (e.g., ".flake8$", ".pydocstyle$", ".env..*$", "^.github/.*",
"^.husky/.*", "scripts/.*", "src/style/.*", ".gitignore$", "env.example$",
".eslintignore$", ".prettierrc$", ".prettierignore$", and any ".*.pem$" style
entries) which causes over‑matching; update each pattern to escape literal dots
(e.g., use "\." for ".") and ensure anchors are correct (keep leading "^" and
trailing "$" where intended) so entries match exact filenames or paths rather
than any character sequences, and apply this change consistently across all
similar lines mentioned in the diff.

In @.github/workflows/scripts/sensitive_file_check.py:
- Around line 55-66: The code currently normalizes only directory-walked file
paths (using rel_path) causing direct file paths to remain absolute and miss
regex matches; update the loop where you handle os.path.isfile(path) and where
you append full_path/rel_path so that you call os.path.relpath (and optionally
os.path.normpath) on both file and directory entries before appending to
files_to_check; look for symbols files_to_check, paths, os.path.isfile,
os.path.isdir, full_path and rel_path and ensure all appended paths are
consistently normalized relative to os.getcwd().

In @.github/workflows/scripts/test/test_sensitive_file_check.py:
- Around line 51-53: The test patch for os.path.relpath uses an unused lambda
argument named start which causes lint failures; update the mock in the test
(the patch around os.path.relpath used when calling
sensitive_file_check.get_files_to_check) to rename the unused parameter to
_start (or *_), e.g. change lambda p, start: p.replace("/root/", "") to lambda
p, _start: p.replace("/root/", "") so Ruff stops flagging the unused argument.
🧹 Nitpick comments (1)
.github/workflows/scripts/sensitive_file_check.py (1)

26-38: Narrow the broad exception in load_patterns.
Catching Exception (Line 36) can mask programming errors and is typically discouraged by linters. Prefer OSError/UnicodeError for file I/O and decoding issues.

♻️ Proposed fix
-    except Exception as error:  # pylint: disable=broad-except
+    except (OSError, UnicodeError) as error:
         print(f"Error reading configuration file: {error}")
         sys.exit(1)

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
.github/workflows/check-sensitive-files.yml (1)

3-10: Add this workflow as a prerequisite job to the PR workflows.

The check-sensitive-files workflow is correctly configured as a reusable workflow (workflow_call), but it is not currently called from any PR workflows. According to issue #34, it must be integrated as a prerequisite to both pull-request-review.yml and pull-request-target.yml. Add a job that calls this workflow in both files using the uses directive.

🤖 Fix all issues with AI agents
In @.github/workflows/check-sensitive-files.yml:
- Around line 62-64: Fix the bash array-length test for CHANGED_FILES: replace
the invalid backticked expression used in the conditional (the if test
referencing `${`#CHANGED_FILES`[@]}`) with the proper array-length expansion
`${`#CHANGED_FILES`[@]}` so the if [ ... -gt 0 ] check works; update the
conditional that invokes python3 sensitive_file_check.py accordingly (the symbol
to locate is the CHANGED_FILES array and the surrounding if [ ... ] test).
🧹 Nitpick comments (1)
.github/workflows/config/sensitive_files.txt (1)

1-55: LGTM overall – patterns are properly escaped based on past review feedback.

Most patterns now correctly escape literal dots. A few minor inconsistencies remain:

  1. Line 5: ^\.env..*$ – the dot before .* should be escaped as ^\.env\..*$ to match files like .env.local rather than .envXlocal.

  2. Line 33: .coderabbit.yaml$ – missing ^ anchor and unescaped leading dot. Should be ^\.coderabbit\.yaml$.

  3. Line 54: ^docs/sidebar..* – unescaped dot before .*. Should be ^docs/sidebar\..*.

  4. Inconsistent anchoring: Some patterns lack the ^ start anchor (e.g., lines 6-7, 12-15, 23, 30-32, etc.), which means they'll match anywhere in the path. This may be intentional for flexibility, but worth confirming.

💡 Suggested fixes for remaining issues
-^\.env..*$
+^\.env\..*$

-.coderabbit.yaml$
+^\.coderabbit\.yaml$

-^docs/sidebar..*
+^docs/sidebar\..*

@nilanjan-sikdar
Copy link
Author

@palisadoes could you please review the changes and let me know if any additional updates are needed?

@nilanjan-sikdar
Copy link
Author

I investigated the failing issue and noticed that the recommendation was to remove the "+" from the Python scripts as its presence appears to break the existing regex patterns However CodeRabbit suggests adding the "+" back in
could you please clarify whether it is safe to revert my changes or if this behavior is related to an ignore sensitive files tag issue ?

Copy link
Contributor

@palisadoes palisadoes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Revert these files. They must not be changed
    .github/workflows/pull-request-review.yml
    .github/workflows/pull-request-target.yml
    
  2. Remove these files from the PR. Each repo will either have their own local version or refer to the script in this PR directly.
    .github/workflows/config/sensitive_files.txt
    .github/workflows/check-sensitive-files.yml
    

Copy link
Contributor

@palisadoes palisadoes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments

pull_request_review:
types: [submitted]
jobs:
sensitive-check:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert. This file must be unchanged


jobs:
sensitive-check:
uses: ./.github/workflows/check-sensitive-files.yml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert. This file must be unchanged


- name: Set up Python
if: steps.check-labels.outputs.skip != 'true'
uses: actions/setup-python@v5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this file


import argparse
import os
import re
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep this file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Central Python Script to detect sensitive files

2 participants