-
Notifications
You must be signed in to change notification settings - Fork 14
feat: add repository analysis with automated PR creation #33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add repository analysis with automated PR creation #33
Conversation
- Add _analyze_pr_bad_habits method for PR pattern detection - Improve _get_language_specific_patterns for dynamic rule generation - Document rule engine limitations for combined restrictions - Update rule confidence based on repository analysis
Summary of ChangesHello @naaa760, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a significant new feature: an automated repository analysis agent capable of generating and proposing Watchflow rules via one-click PR creation. The core enhancements include a more intelligent rule recommendation system that leverages repository language features, historical PR patterns, and contributing guidelines to suggest highly relevant rules. Furthermore, the PR creation workflow has been made more robust with detailed error handling and comprehensive logging for GitHub API interactions, ensuring a smoother and more transparent user experience when automating rule deployment. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a new feature for repository analysis and automated PR creation with recommended rules. The changes include new agent logic, an API endpoint for the PR flow, and significant improvements to logging and error handling, which enhance debuggability.
My review has identified several critical issues, primarily related to incorrect indentation in the Python code that will cause runtime errors. There are also issues with malformed f-strings that will generate invalid YAML for the rules. Additionally, the core analysis function for PR history is currently a stub, which means the logic for dynamically adjusting rule confidence is not yet functional. I've provided suggestions to fix these issues and also a recommendation to make part of the new logic more robust.
| workflows = await github_client.list_directory_any_auth( | ||
| repo_full_name=repo, path=".github/workflows", installation_id=installation_id | ||
| ) | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| installation_id = state.installation_id | ||
|
|
||
| content = await github_client.get_file_content( | ||
| content = await github_client.get_file_content( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| f""" | ||
| description: "Require tests when code changes" | ||
| enabled: true | ||
| enabled: true | ||
| severity: medium | ||
| event_types: | ||
| - pull_request | ||
| parameters: | ||
| event_types: | ||
| - pull_request | ||
| parameters: | ||
| source_patterns: | ||
| {chr(10).join(f' - "{pattern}"' for pattern in source_patterns)} | ||
| test_patterns: | ||
| {chr(10).join(f' - "{pattern}"' for pattern in test_patterns)} | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The indentation within this f-string for generating the YAML rule is incorrect. Keys like enabled, event_types, and parameters are not properly indented, which will result in invalid YAML and cause parsing errors.
f"""
description: "Require tests when code changes"
enabled: true
severity: medium
event_types:
- pull_request
parameters:
source_patterns:
{chr(10).join(f' - "{pattern}"' for pattern in source_patterns)}
test_patterns:
{chr(10).join(f' - "{pattern}"' for pattern in test_patterns)}
"""| """ | ||
| description: "Ensure PRs include context" | ||
| enabled: true | ||
| enabled: true | ||
| severity: low | ||
| event_types: | ||
| - pull_request | ||
| parameters: | ||
| event_types: | ||
| - pull_request | ||
| parameters: | ||
| min_description_length: 50 | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| def validate_recommendations(state: RepositoryAnalysisState) -> None: | ||
| """Ensure generated YAML is valid.""" | ||
| for rec in state.recommendations: | ||
| for rec in state.recommendations: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| def _analyze_pr_bad_habits(state: RepositoryAnalysisState) -> dict[str, Any]: | ||
| """ | ||
| Analyze PR history to detect bad habits and patterns. | ||
| Returns a dict with detected issues like: | ||
| - missing_tests: PRs without test files | ||
| - short_descriptions: PRs with very short descriptions | ||
| - no_reviews: PRs merged without reviews | ||
| """ | ||
| if not state.pr_samples: | ||
| return {} | ||
|
|
||
| issues: dict[str, Any] = { | ||
| "missing_tests": 0, | ||
| "short_descriptions": 0, | ||
| "no_reviews": 0, | ||
| "total_analyzed": len(state.pr_samples), | ||
| } | ||
|
|
||
| # Note: We can't analyze PR diffs/descriptions from the basic PR list API | ||
| # This would require fetching individual PR details which is expensive. | ||
| # For now, we return basic stats that can inform recommendations. | ||
|
|
||
| return issues |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new function _analyze_pr_bad_habits is currently a stub that returns zero for all issue counts. This means the new logic in _default_recommendations that adjusts confidence scores based on these counts will never be triggered, rendering it dead code for now.
Furthermore, the docstring mentions detecting short_descriptions, but the PullRequestSample model does not contain a description field, only a title. This should be clarified or corrected.
Consider implementing at least a basic analysis (e.g., checking for short PR titles) to make this feature functional.
| for rec in recommendations: | ||
| if "tests" in rec.yaml_rule.lower(): | ||
| rec.confidence = min(0.95, rec.confidence + 0.1) | ||
| rec.reasoning += " Contributing guidelines explicitly require tests." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current logic for finding the test-related rule recommendation relies on a simple substring search ("tests" in rec.yaml_rule.lower()). This is brittle and could lead to incorrectly modifying another rule if its content happens to include the word "tests". A more robust approach would be to identify the rule by a more specific characteristic, such as its description, and to stop searching once the rule is found.
for rec in recommendations:
if 'description: "Require tests when code changes"' in rec.yaml_rule:
rec.confidence = min(0.95, rec.confidence + 0.1)
rec.reasoning += " Contributing guidelines explicitly require tests."
break|
Codecov Report❌ Patch coverage is ❌ Your patch status has failed because the patch coverage (6.5%) is below the target coverage (80.0%). You can increase the patch coverage or adjust the target coverage. @@ Coverage Diff @@
## main #33 +/- ##
=======================================
- Coverage 33.2% 32.9% -0.3%
=======================================
Files 85 85
Lines 5082 5122 +40
=======================================
+ Hits 1688 1689 +1
- Misses 3394 3433 +39 Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
- Fix YAML indentation in rule generation (proper 4-space indentation for list items) - Implement _analyze_pr_bad_habits to actually analyze PR samples: * Detect short titles (< 10 chars) as indicator of missing context * Estimate missing tests based on changed_files and title analysis - Update docstring to mention 'title' instead of 'description' - Update references from 'short_descriptions' to 'short_titles'
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a valuable new feature for repository analysis and automated PR creation with recommended rules. The implementation is solid, with intelligent rule recommendations based on repository language, PR history, and contributing guidelines. I'm particularly impressed with the significant improvements to logging and error handling throughout the PR creation flow, which will greatly enhance debuggability and provide clearer feedback to users. The code is well-structured and the changes are easy to follow. I have one minor suggestion to improve type hinting for better code clarity and maintainability. Overall, this is an excellent contribution.
| def _analyze_pr_bad_habits(state: RepositoryAnalysisState) -> dict[str, Any]: | ||
| """ | ||
| Analyze PR history to detect bad habits and patterns. | ||
| Returns a dict with detected issues like: | ||
| - missing_tests: PRs without test files (estimated based on changed_files) | ||
| - short_titles: PRs with very short titles (< 10 characters) | ||
| - no_reviews: PRs merged without reviews (always 0, as we can't determine this from list API) | ||
| Note: We can't analyze PR diffs/descriptions from the basic PR list API. | ||
| This would require fetching individual PR details which is expensive. | ||
| We analyze what we can from the PR list metadata. | ||
| """ | ||
| if not state.pr_samples: | ||
| return {} | ||
|
|
||
| issues: dict[str, Any] = { | ||
| "missing_tests": 0, | ||
| "short_titles": 0, | ||
| "no_reviews": 0, | ||
| "total_analyzed": len(state.pr_samples), | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type hint dict[str, Any] is used for both the function's return value and the issues variable. Since all values in the dictionary are integers, it would be more precise to use dict[str, int]. This improves type safety and makes the code easier to understand for future maintainers.
def _analyze_pr_bad_habits(state: RepositoryAnalysisState) -> dict[str, int]:
"""
Analyze PR history to detect bad habits and patterns.
Returns a dict with detected issues like:
- missing_tests: PRs without test files (estimated based on changed_files)
- short_titles: PRs with very short titles (< 10 characters)
- no_reviews: PRs merged without reviews (always 0, as we can't determine this from list API)
Note: We can't analyze PR diffs/descriptions from the basic PR list API.
This would require fetching individual PR details which is expensive.
We analyze what we can from the PR list metadata.
"""
if not state.pr_samples:
return {}
issues: dict[str, int] = {
"missing_tests": 0,
"short_titles": 0,
"no_reviews": 0,
"total_analyzed": len(state.pr_samples),
}
Summary
Key Changes
parametersstructure (removed invalidvalidators/actionssections)