-
Notifications
You must be signed in to change notification settings - Fork 0
feat(jira): complete Jira integration with multi-source support #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Adds feature 002-jira-integration with full SpecKit artifacts: - spec.md: 4 user stories (P1-P4), 22 functional requirements, 7 success criteria, edge cases, clarifications - plan.md: Technical context, constitution check, project structure - research.md: Jira API research (auth, pagination, rate limiting) - data-model.md: JiraConfig, JiraIssue, JiraComment entities - contracts/: Jira REST API and module interface contracts - quickstart.md: Setup and usage guide - tasks.md: 61 tasks organized by user story with TDD approach - checklists/: Comprehensive requirements quality checklist (80 items) Key features: - Jira Cloud (API v3) and Server/Data Center (API v2) support - Secure credential handling via environment variables - Multi-platform CLI with --sources flag - CSV export for issues and comments (RFC 4180) - Backward compatibility wrapper for github_analyzer.py - Interactive project selection when jira_projects.txt missing
Add full Jira support with multi-source CLI: - JiraClient with pagination, rate limiting, and retry logic - Support for Jira Cloud (API v3) and Server/Data Center (API v2) - JiraExporter for CSV export (issues + comments) - JiraIssueAnalyzer for project summaries - Multi-source CLI with --sources flag (auto, github, jira, github,jira) - Auto-detection of available sources from environment credentials - Interactive Jira project selection when jira_projects.txt is missing - dev_analyzer.py as new primary entrypoint - Secure token handling (never exposed in logs/errors) - ADF (Atlassian Document Format) to plain text conversion New files: - src/github_analyzer/api/jira_client.py - src/github_analyzer/exporters/jira_exporter.py - src/github_analyzer/analyzers/jira_issues.py - dev_analyzer.py - jira_projects.txt.example Tests: 512 passed, 90% coverage
- Add comprehensive tests for Jira client (urllib/requests paths) - Add tests for GitHub client requests library path - Add validation tests for Jira URL, project keys, ISO8601 dates - Add troubleshooting section for Jira in README - Update coverage requirement from 95% to 90% in docs
- Change ASCII banner from "GITHUB ANALYZER" to "DEV ANALYZER" - Update description to mention GitHub and Jira - Add Jira Integration to features list
- Migrate from deprecated /rest/api/X/search to /rest/api/3/search/jql (Atlassian removed the old endpoint, returning 410 Gone) - Use cursor-based pagination (nextPageToken/isLast) instead of offset-based - Quote project keys in JQL to handle reserved words (e.g., "AS", "IN") - Auto-select all Jira projects when jira_projects.txt is missing/empty - Integrate Jira extraction into main() CLI flow with auto-detection - Update test fixtures for new API response format
Summary of ChangesHello @amargiovanni, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a significant and well-structured feature: Jira integration with multi-source support. The changes are extensive, touching everything from the CLI entry point and configuration to adding new API clients, analyzers, and exporters. The inclusion of comprehensive documentation, specifications, and tests is commendable. I've identified a critical compatibility issue with Jira Server, some code duplication, and a few areas for improvement in terms of consistency and adherence to the project's own specifications. Overall, this is a solid foundation for multi-source analysis.
| def search_issues( | ||
| self, | ||
| project_keys: list[str], | ||
| since_date: datetime, | ||
| ) -> Iterator[JiraIssue]: | ||
| """Search issues updated since given date. | ||
| Args: | ||
| project_keys: List of project keys to search. | ||
| since_date: Only return issues updated after this date. | ||
| Yields: | ||
| JiraIssue objects matching the criteria. | ||
| Raises: | ||
| JiraAPIError: If API request fails. | ||
| """ | ||
| if not project_keys: | ||
| return | ||
|
|
||
| # Build JQL query (FR-005) | ||
| # Quote project keys to handle reserved JQL words (e.g., "AS", "IN", "OR") | ||
| quoted_keys = [f'"{key}"' for key in project_keys] | ||
| projects_jql = ", ".join(quoted_keys) | ||
| date_str = since_date.strftime("%Y-%m-%d") | ||
| jql = f"project in ({projects_jql}) AND updated >= '{date_str}' ORDER BY updated DESC" | ||
|
|
||
| # Use new /search/jql endpoint with cursor-based pagination | ||
| # See: https://developer.atlassian.com/changelog/#CHANGE-2046 | ||
| max_results = 100 | ||
| next_page_token: str | None = None | ||
|
|
||
| while True: | ||
| params: dict[str, Any] = { | ||
| "jql": jql, | ||
| "maxResults": max_results, | ||
| "fields": "*all,-comment", # All fields except comments (fetched separately) | ||
| } | ||
|
|
||
| if next_page_token: | ||
| params["nextPageToken"] = next_page_token | ||
|
|
||
| response = self._make_request( | ||
| "GET", | ||
| f"/rest/api/{self.api_version}/search/jql", | ||
| params=params, | ||
| ) | ||
|
|
||
| issues = response.get("issues", []) | ||
|
|
||
| for issue_data in issues: | ||
| yield self._parse_issue(issue_data) | ||
|
|
||
| # Check if more pages (cursor-based pagination) | ||
| if response.get("isLast", True) or not issues: | ||
| break | ||
|
|
||
| next_page_token = response.get("nextPageToken") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current implementation of search_issues uses the GET /rest/api/{version}/search/jql endpoint with cursor-based pagination (nextPageToken). This is the modern approach for Jira Cloud (API v3), but it is not compatible with many Jira Server instances (API v2). Jira Server typically uses the older POST /rest/api/2/search endpoint with offset-based pagination (startAt, maxResults, total).
To ensure compatibility with both Jira Cloud and Server as intended, the implementation should check self.api_version and branch its logic to use the correct endpoint, HTTP method, and pagination mechanism for the target environment.
src/github_analyzer/cli/main.py
Outdated
| def run_extraction( | ||
| sources: list[DataSource], | ||
| output_dir: str, | ||
| days: int, | ||
| repos_file: str | None = None, | ||
| jira_projects_file: str | None = None, | ||
| verbose: bool = True, | ||
| fetch_pr_details: bool = False, | ||
| ) -> dict: | ||
| """Run extraction for specified sources. | ||
| Args: | ||
| sources: List of data sources to extract from. | ||
| output_dir: Directory for output CSV files. | ||
| days: Number of days to analyze. | ||
| repos_file: Path to repos.txt for GitHub. | ||
| jira_projects_file: Path to jira_projects.txt for Jira. | ||
| verbose: Enable verbose output. | ||
| fetch_pr_details: Fetch full PR details for GitHub. | ||
| Returns: | ||
| Dictionary with extraction results. | ||
| """ | ||
| results = {"github": None, "jira": None} | ||
|
|
||
| if DataSource.GITHUB in sources: | ||
| # Run GitHub extraction | ||
| config = AnalyzerConfig.from_env() | ||
| if repos_file: | ||
| config.repos_file = repos_file | ||
| config.output_dir = output_dir | ||
| config.days = days | ||
| config.verbose = verbose | ||
| config.validate() | ||
|
|
||
| repositories = load_repositories(config.repos_file) | ||
|
|
||
| analyzer = GitHubAnalyzer(config, fetch_pr_details=fetch_pr_details) | ||
| try: | ||
| analyzer.run(repositories) | ||
| results["github"] = {"status": "success", "repos": len(repositories)} | ||
| finally: | ||
| analyzer.close() | ||
|
|
||
| if DataSource.JIRA in sources: | ||
| # Run Jira extraction | ||
| from src.github_analyzer.api.jira_client import JiraClient | ||
|
|
||
| jira_config = JiraConfig.from_env() | ||
| if not jira_config: | ||
| raise ConfigurationError("Jira credentials not configured") | ||
|
|
||
| # Get Jira projects | ||
| projects_file = jira_projects_file or jira_config.jira_projects_file | ||
| project_keys = select_jira_projects(projects_file, jira_config) | ||
|
|
||
| if not project_keys: | ||
| print("No Jira projects selected. Skipping Jira extraction.") | ||
| else: | ||
| client = JiraClient(jira_config) | ||
| since = datetime.now(timezone.utc) - timedelta(days=days) | ||
|
|
||
| # Collect issues and comments | ||
| all_issues = list(client.search_issues(project_keys, since)) | ||
| all_comments = [] | ||
| for issue in all_issues: | ||
| comments = client.get_comments(issue.key) | ||
| all_comments.extend(comments) | ||
|
|
||
| # Export Jira data to CSV | ||
| jira_exporter = JiraExporter(output_dir) | ||
| issues_file = jira_exporter.export_issues(all_issues) | ||
| comments_file = jira_exporter.export_comments(all_comments) | ||
|
|
||
| results["jira"] = { | ||
| "status": "success", | ||
| "issues": len(all_issues), | ||
| "comments": len(all_comments), | ||
| "files": [str(issues_file), str(comments_file)], | ||
| } | ||
|
|
||
| return results |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This run_extraction function appears to be dead code, as it is never called within the application. The logic it contains is duplicated inside the main function (from line 690 onwards). This duplication increases maintenance overhead and can lead to inconsistencies.
The logic should be consolidated. I recommend removing this function and refactoring the duplicated logic from main into a new, single function that is called by main to handle the extraction orchestration.
| def select_jira_projects( | ||
| projects_file: str, | ||
| jira_config: JiraConfig | None, | ||
| ) -> list[str]: | ||
| """Select Jira projects from file or use all available. | ||
| Args: | ||
| projects_file: Path to jira_projects.txt file. | ||
| jira_config: Jira configuration (required to fetch available projects). | ||
| Returns: | ||
| List of project keys to analyze. | ||
| """ | ||
| # Try loading from file first | ||
| file_projects = load_jira_projects(projects_file) | ||
| if file_projects: | ||
| return file_projects | ||
|
|
||
| # No file or empty - use all available projects | ||
| if not jira_config: | ||
| return [] | ||
|
|
||
| # Fetch available projects from Jira | ||
| from src.github_analyzer.api.jira_client import JiraClient | ||
|
|
||
| client = JiraClient(jira_config) | ||
| available_projects = client.get_projects() | ||
|
|
||
| if not available_projects: | ||
| print("No projects found in Jira instance.") | ||
| return [] | ||
|
|
||
| # Use all available projects | ||
| all_keys = [p.key for p in available_projects] | ||
| print(f"\nNo {projects_file} found. Using all {len(all_keys)} available Jira projects.") | ||
|
|
||
| return all_keys | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The behavior of this function when jira_projects.txt is missing or empty does not fully align with the specification FR-009a. The spec requires prompting the user to choose between analyzing all accessible projects or specifying them manually. The current implementation automatically defaults to using all available projects without user interaction.
To adhere to the specification, this function should be updated to include an interactive prompt for the user.
src/github_analyzer/cli/main.py
Outdated
| if not available_projects: | ||
| print("No projects found in Jira instance.") | ||
| return [] | ||
|
|
||
| # Use all available projects | ||
| all_keys = [p.key for p in available_projects] | ||
| print(f"\nNo {projects_file} found. Using all {len(all_keys)} available Jira projects.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function uses direct print() calls to provide feedback to the user. This is inconsistent with the rest of the application, which uses the TerminalOutput class for structured and formatted logging. Using the TerminalOutput instance would ensure all CLI messages have a consistent look and feel.
Consider passing the output object to this function and replacing the print() calls with output.log() or output.info().
| def validate_iso8601_date(date_str: str) -> bool: | ||
| """Validate ISO 8601 date format. | ||
| Validates that the string is a valid ISO 8601 date (FR-021). | ||
| Supports both date-only and datetime formats. | ||
| Args: | ||
| date_str: The date string to validate. | ||
| Returns: | ||
| True if date is valid ISO 8601 format, False otherwise. | ||
| Examples: | ||
| >>> validate_iso8601_date("2025-11-28") | ||
| True | ||
| >>> validate_iso8601_date("2025-11-28T10:30:00Z") | ||
| True | ||
| >>> validate_iso8601_date("2025-11-28T10:30:00+00:00") | ||
| True | ||
| >>> validate_iso8601_date("28-11-2025") # wrong format | ||
| False | ||
| >>> validate_iso8601_date("invalid") | ||
| False | ||
| """ | ||
| if not date_str: | ||
| return False | ||
|
|
||
| # ISO 8601 date patterns | ||
| # Date only: YYYY-MM-DD | ||
| # Datetime with Z: YYYY-MM-DDTHH:MM:SSZ | ||
| # Datetime with offset: YYYY-MM-DDTHH:MM:SS+HH:MM | ||
| patterns = [ | ||
| r"^\d{4}-\d{2}-\d{2}$", # Date only | ||
| r"^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}Z$", # Datetime with Z | ||
| r"^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}[+-]\d{2}:\d{2}$", # Datetime with offset | ||
| r"^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d+Z$", # Datetime with milliseconds and Z | ||
| r"^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d+[+-]\d{2}:\d{2}$", # With ms and offset | ||
| ] | ||
|
|
||
| if not any(re.match(pattern, date_str) for pattern in patterns): | ||
| return False | ||
|
|
||
| # Additional validation: check that date components are valid | ||
| try: | ||
| # Extract date part | ||
| date_part = date_str[:10] | ||
| year, month, day = map(int, date_part.split("-")) | ||
|
|
||
| # Basic range checks | ||
| if not (1 <= month <= 12): | ||
| return False | ||
| if not (1 <= day <= 31): | ||
| return False | ||
| if year < 1900 or year > 2100: | ||
| return False | ||
|
|
||
| return True | ||
| except (ValueError, IndexError): | ||
| return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| def _parse_datetime(self, value: str | None) -> datetime | None: | ||
| """Parse Jira datetime string to datetime object. | ||
| Args: | ||
| value: Jira datetime string (e.g., "2025-11-28T10:30:00.000+0000"). | ||
| Returns: | ||
| Parsed datetime in UTC, or None if value is empty/None. | ||
| """ | ||
| if not value: | ||
| return None | ||
|
|
||
| # Jira format: "2025-11-28T10:30:00.000+0000" | ||
| try: | ||
| # Remove milliseconds and fix timezone format | ||
| if "." in value: | ||
| value = value.split(".")[0] + value[-5:] | ||
|
|
||
| # Handle +0000 format (no colon) | ||
| if value[-5:].replace("-", "+")[0] in "+-" and ":" not in value[-5:]: | ||
| value = value[:-2] + ":" + value[-2:] | ||
|
|
||
| return datetime.fromisoformat(value.replace("Z", "+00:00")) | ||
| except (ValueError, IndexError): | ||
| return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic in this function for parsing datetime strings is quite complex and can be difficult to follow. While it appears to handle the expected Jira formats, adding comments to explain each step would significantly improve readability and maintainability for future developers.
For example, explaining why milliseconds are stripped or why the timezone colon is being manually added would be very helpful.
- SIM103: Return condition directly instead of if/return True - SIM102: Combine nested if statements with 'and'
The test was failing because the new source detection logic requires at least one data source (GitHub token or Jira credentials). Added environment variable mock to provide a valid token.
- Add timezone import for epoch datetime fallback
- Use epoch datetime as fallback when created/updated is None
- Cast node.get("text") to str to satisfy return type
- Add explicit type annotation for results dict in run_extraction
Welcome to Codecov 🎉Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests. ℹ️ You can also turn on project coverage checks and project coverage reporting on Pull Request comment Thanks for integrating Codecov - We've got you covered ☂️ |
The implementation now supports both Jira Cloud and Server/Data Center: - Cloud (API v3): GET /rest/api/3/search/jql with cursor-based pagination - Server/DC (API v2): POST /rest/api/2/search with offset-based pagination The API version is auto-detected from the URL (atlassian.net = Cloud). Added tests for Server/DC path including pagination.
The run_extraction() function was never called - its logic was duplicated inside main(). Removed the unused function to reduce maintenance overhead and avoid inconsistencies.
When jira_projects.txt is missing or empty, user is now prompted with: - [A] Analyze ALL accessible projects - [S] Specify project keys manually (comma-separated) - [L] Select from list by number - [Q] Quit/Skip Jira extraction Added interactive=False parameter for non-interactive/test use cases. Added tests for all interactive prompt options.
…ojects Replace direct print() calls with TerminalOutput.log() for consistency with the rest of the CLI. Added optional 'output' parameter that falls back to print() when not provided (for backward compatibility and tests).
…ersion Explain why milliseconds are stripped and timezone colon is added when converting Jira datetime format to Python fromisoformat() compatible format.
Summary
/search/jqlendpoint (migrated from deprecated/search)jira_projects.txtis missingFeatures
--sources auto|github|jira|github,jirajira_issues.csvandjira_comments.csvEnvironment Variables