Skip to content

Conversation

@amargiovanni
Copy link
Contributor

Summary

  • Add complete Jira integration alongside existing GitHub analysis
  • Support multi-source extraction with auto-detection of available credentials
  • Implement new Jira Cloud API /search/jql endpoint (migrated from deprecated /search)
  • Auto-select all accessible Jira projects when jira_projects.txt is missing
  • Export Jira issues and comments to CSV format

Features

  • Multi-source CLI: --sources auto|github|jira|github,jira
  • Jira Client: Full API v3 support with cursor-based pagination
  • JQL Safety: Project keys are quoted to handle reserved words (AS, IN, OR, etc.)
  • Auto-detection: Credentials checked from environment variables
  • CSV Export: jira_issues.csv and jira_comments.csv

Environment Variables

# GitHub (existing)
GITHUB_TOKEN=xxx

# Jira (new)
JIRA_URL=https://company.atlassian.net
JIRA_EMAIL=user@company.com
JIRA_API_TOKEN=xxx
Test plan
 All 585 tests passing (207 Jira-specific)
 Coverage at 94%
 Tested with real Jira Cloud instance (27 projects, 116 issues)
 Verified API migration from deprecated /search to /search/jql
 Confirmed JQL reserved word handling (project "AS")

Adds feature 002-jira-integration with full SpecKit artifacts:

- spec.md: 4 user stories (P1-P4), 22 functional requirements,
  7 success criteria, edge cases, clarifications
- plan.md: Technical context, constitution check, project structure
- research.md: Jira API research (auth, pagination, rate limiting)
- data-model.md: JiraConfig, JiraIssue, JiraComment entities
- contracts/: Jira REST API and module interface contracts
- quickstart.md: Setup and usage guide
- tasks.md: 61 tasks organized by user story with TDD approach
- checklists/: Comprehensive requirements quality checklist (80 items)

Key features:
- Jira Cloud (API v3) and Server/Data Center (API v2) support
- Secure credential handling via environment variables
- Multi-platform CLI with --sources flag
- CSV export for issues and comments (RFC 4180)
- Backward compatibility wrapper for github_analyzer.py
- Interactive project selection when jira_projects.txt missing
Add full Jira support with multi-source CLI:

- JiraClient with pagination, rate limiting, and retry logic
- Support for Jira Cloud (API v3) and Server/Data Center (API v2)
- JiraExporter for CSV export (issues + comments)
- JiraIssueAnalyzer for project summaries
- Multi-source CLI with --sources flag (auto, github, jira, github,jira)
- Auto-detection of available sources from environment credentials
- Interactive Jira project selection when jira_projects.txt is missing
- dev_analyzer.py as new primary entrypoint
- Secure token handling (never exposed in logs/errors)
- ADF (Atlassian Document Format) to plain text conversion

New files:
- src/github_analyzer/api/jira_client.py
- src/github_analyzer/exporters/jira_exporter.py
- src/github_analyzer/analyzers/jira_issues.py
- dev_analyzer.py
- jira_projects.txt.example

Tests: 512 passed, 90% coverage
- Add comprehensive tests for Jira client (urllib/requests paths)
- Add tests for GitHub client requests library path
- Add validation tests for Jira URL, project keys, ISO8601 dates
- Add troubleshooting section for Jira in README
- Update coverage requirement from 95% to 90% in docs
- Change ASCII banner from "GITHUB ANALYZER" to "DEV ANALYZER"
- Update description to mention GitHub and Jira
- Add Jira Integration to features list
- Migrate from deprecated /rest/api/X/search to /rest/api/3/search/jql
  (Atlassian removed the old endpoint, returning 410 Gone)
- Use cursor-based pagination (nextPageToken/isLast) instead of offset-based
- Quote project keys in JQL to handle reserved words (e.g., "AS", "IN")
- Auto-select all Jira projects when jira_projects.txt is missing/empty
- Integrate Jira extraction into main() CLI flow with auto-detection
- Update test fixtures for new API response format
@gemini-code-assist
Copy link

Summary of Changes

Hello @amargiovanni, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the github_analyzer tool by transforming it into a multi-platform DevAnalyzer. It introduces comprehensive integration with Jira, allowing users to extract and analyze issues and comments alongside existing GitHub data. This change provides a unified view of development metrics, supporting both platforms through a new, flexible command-line interface and robust API handling, ensuring secure credential management and efficient data export.

Highlights

  • Jira Integration: Full integration with Jira Cloud and Server/Data Center for issue and comment extraction, enabling comprehensive analysis alongside GitHub data.
  • Multi-Source Support: The tool now supports analyzing data from both GitHub and Jira, with automatic credential detection and a flexible command-line interface to select sources.
  • New Entry Point: Introduced dev_analyzer.py as the primary entry point for the multi-platform analyzer, with github_analyzer.py serving as a backward-compatible wrapper.
  • Jira API Modernization: Migrated to the /search/jql endpoint for Jira Cloud API, ensuring compatibility and efficiency, and handling JQL reserved words.
  • Flexible Project Selection: Automatically selects all accessible Jira projects if jira_projects.txt is missing or empty, or allows interactive selection by the user.
  • CSV Export: Exports Jira issues and comments to dedicated CSV files (jira_issues_export.csv and jira_comments_export.csv), consistent with existing GitHub export formats.
  • Test Coverage Adjustment: The minimum required test coverage threshold has been adjusted from 95% to 90% to accommodate the expanded codebase.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant and well-structured feature: Jira integration with multi-source support. The changes are extensive, touching everything from the CLI entry point and configuration to adding new API clients, analyzers, and exporters. The inclusion of comprehensive documentation, specifications, and tests is commendable. I've identified a critical compatibility issue with Jira Server, some code duplication, and a few areas for improvement in terms of consistency and adherence to the project's own specifications. Overall, this is a solid foundation for multi-source analysis.

Comment on lines 342 to 399
def search_issues(
self,
project_keys: list[str],
since_date: datetime,
) -> Iterator[JiraIssue]:
"""Search issues updated since given date.
Args:
project_keys: List of project keys to search.
since_date: Only return issues updated after this date.
Yields:
JiraIssue objects matching the criteria.
Raises:
JiraAPIError: If API request fails.
"""
if not project_keys:
return

# Build JQL query (FR-005)
# Quote project keys to handle reserved JQL words (e.g., "AS", "IN", "OR")
quoted_keys = [f'"{key}"' for key in project_keys]
projects_jql = ", ".join(quoted_keys)
date_str = since_date.strftime("%Y-%m-%d")
jql = f"project in ({projects_jql}) AND updated >= '{date_str}' ORDER BY updated DESC"

# Use new /search/jql endpoint with cursor-based pagination
# See: https://developer.atlassian.com/changelog/#CHANGE-2046
max_results = 100
next_page_token: str | None = None

while True:
params: dict[str, Any] = {
"jql": jql,
"maxResults": max_results,
"fields": "*all,-comment", # All fields except comments (fetched separately)
}

if next_page_token:
params["nextPageToken"] = next_page_token

response = self._make_request(
"GET",
f"/rest/api/{self.api_version}/search/jql",
params=params,
)

issues = response.get("issues", [])

for issue_data in issues:
yield self._parse_issue(issue_data)

# Check if more pages (cursor-based pagination)
if response.get("isLast", True) or not issues:
break

next_page_token = response.get("nextPageToken")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The current implementation of search_issues uses the GET /rest/api/{version}/search/jql endpoint with cursor-based pagination (nextPageToken). This is the modern approach for Jira Cloud (API v3), but it is not compatible with many Jira Server instances (API v2). Jira Server typically uses the older POST /rest/api/2/search endpoint with offset-based pagination (startAt, maxResults, total).

To ensure compatibility with both Jira Cloud and Server as intended, the implementation should check self.api_version and branch its logic to use the correct endpoint, HTTP method, and pagination mechanism for the target environment.

Comment on lines 503 to 584
def run_extraction(
sources: list[DataSource],
output_dir: str,
days: int,
repos_file: str | None = None,
jira_projects_file: str | None = None,
verbose: bool = True,
fetch_pr_details: bool = False,
) -> dict:
"""Run extraction for specified sources.
Args:
sources: List of data sources to extract from.
output_dir: Directory for output CSV files.
days: Number of days to analyze.
repos_file: Path to repos.txt for GitHub.
jira_projects_file: Path to jira_projects.txt for Jira.
verbose: Enable verbose output.
fetch_pr_details: Fetch full PR details for GitHub.
Returns:
Dictionary with extraction results.
"""
results = {"github": None, "jira": None}

if DataSource.GITHUB in sources:
# Run GitHub extraction
config = AnalyzerConfig.from_env()
if repos_file:
config.repos_file = repos_file
config.output_dir = output_dir
config.days = days
config.verbose = verbose
config.validate()

repositories = load_repositories(config.repos_file)

analyzer = GitHubAnalyzer(config, fetch_pr_details=fetch_pr_details)
try:
analyzer.run(repositories)
results["github"] = {"status": "success", "repos": len(repositories)}
finally:
analyzer.close()

if DataSource.JIRA in sources:
# Run Jira extraction
from src.github_analyzer.api.jira_client import JiraClient

jira_config = JiraConfig.from_env()
if not jira_config:
raise ConfigurationError("Jira credentials not configured")

# Get Jira projects
projects_file = jira_projects_file or jira_config.jira_projects_file
project_keys = select_jira_projects(projects_file, jira_config)

if not project_keys:
print("No Jira projects selected. Skipping Jira extraction.")
else:
client = JiraClient(jira_config)
since = datetime.now(timezone.utc) - timedelta(days=days)

# Collect issues and comments
all_issues = list(client.search_issues(project_keys, since))
all_comments = []
for issue in all_issues:
comments = client.get_comments(issue.key)
all_comments.extend(comments)

# Export Jira data to CSV
jira_exporter = JiraExporter(output_dir)
issues_file = jira_exporter.export_issues(all_issues)
comments_file = jira_exporter.export_comments(all_comments)

results["jira"] = {
"status": "success",
"issues": len(all_issues),
"comments": len(all_comments),
"files": [str(issues_file), str(comments_file)],
}

return results

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This run_extraction function appears to be dead code, as it is never called within the application. The logic it contains is duplicated inside the main function (from line 690 onwards). This duplication increases maintenance overhead and can lead to inconsistencies.

The logic should be consolidated. I recommend removing this function and refactoring the duplicated logic from main into a new, single function that is called by main to handle the extraction orchestration.

Comment on lines 464 to 501
def select_jira_projects(
projects_file: str,
jira_config: JiraConfig | None,
) -> list[str]:
"""Select Jira projects from file or use all available.
Args:
projects_file: Path to jira_projects.txt file.
jira_config: Jira configuration (required to fetch available projects).
Returns:
List of project keys to analyze.
"""
# Try loading from file first
file_projects = load_jira_projects(projects_file)
if file_projects:
return file_projects

# No file or empty - use all available projects
if not jira_config:
return []

# Fetch available projects from Jira
from src.github_analyzer.api.jira_client import JiraClient

client = JiraClient(jira_config)
available_projects = client.get_projects()

if not available_projects:
print("No projects found in Jira instance.")
return []

# Use all available projects
all_keys = [p.key for p in available_projects]
print(f"\nNo {projects_file} found. Using all {len(all_keys)} available Jira projects.")

return all_keys

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The behavior of this function when jira_projects.txt is missing or empty does not fully align with the specification FR-009a. The spec requires prompting the user to choose between analyzing all accessible projects or specifying them manually. The current implementation automatically defaults to using all available projects without user interaction.

To adhere to the specification, this function should be updated to include an interactive prompt for the user.

Comment on lines 492 to 498
if not available_projects:
print("No projects found in Jira instance.")
return []

# Use all available projects
all_keys = [p.key for p in available_projects]
print(f"\nNo {projects_file} found. Using all {len(all_keys)} available Jira projects.")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This function uses direct print() calls to provide feedback to the user. This is inconsistent with the rest of the application, which uses the TerminalOutput class for structured and formatted logging. Using the TerminalOutput instance would ensure all CLI messages have a consistent look and feel.

Consider passing the output object to this function and replacing the print() calls with output.log() or output.info().

Comment on lines 427 to 485
def validate_iso8601_date(date_str: str) -> bool:
"""Validate ISO 8601 date format.
Validates that the string is a valid ISO 8601 date (FR-021).
Supports both date-only and datetime formats.
Args:
date_str: The date string to validate.
Returns:
True if date is valid ISO 8601 format, False otherwise.
Examples:
>>> validate_iso8601_date("2025-11-28")
True
>>> validate_iso8601_date("2025-11-28T10:30:00Z")
True
>>> validate_iso8601_date("2025-11-28T10:30:00+00:00")
True
>>> validate_iso8601_date("28-11-2025") # wrong format
False
>>> validate_iso8601_date("invalid")
False
"""
if not date_str:
return False

# ISO 8601 date patterns
# Date only: YYYY-MM-DD
# Datetime with Z: YYYY-MM-DDTHH:MM:SSZ
# Datetime with offset: YYYY-MM-DDTHH:MM:SS+HH:MM
patterns = [
r"^\d{4}-\d{2}-\d{2}$", # Date only
r"^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}Z$", # Datetime with Z
r"^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}[+-]\d{2}:\d{2}$", # Datetime with offset
r"^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d+Z$", # Datetime with milliseconds and Z
r"^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d+[+-]\d{2}:\d{2}$", # With ms and offset
]

if not any(re.match(pattern, date_str) for pattern in patterns):
return False

# Additional validation: check that date components are valid
try:
# Extract date part
date_part = date_str[:10]
year, month, day = map(int, date_part.split("-"))

# Basic range checks
if not (1 <= month <= 12):
return False
if not (1 <= day <= 31):
return False
if year < 1900 or year > 2100:
return False

return True
except (ValueError, IndexError):
return False

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The function validate_iso8601_date is defined here but does not appear to be used anywhere in the codebase. Unused code should be removed to improve maintainability and reduce clutter. If this is intended for future use, it would be better to add it when it's actually needed.

Comment on lines 489 to 513
def _parse_datetime(self, value: str | None) -> datetime | None:
"""Parse Jira datetime string to datetime object.
Args:
value: Jira datetime string (e.g., "2025-11-28T10:30:00.000+0000").
Returns:
Parsed datetime in UTC, or None if value is empty/None.
"""
if not value:
return None

# Jira format: "2025-11-28T10:30:00.000+0000"
try:
# Remove milliseconds and fix timezone format
if "." in value:
value = value.split(".")[0] + value[-5:]

# Handle +0000 format (no colon)
if value[-5:].replace("-", "+")[0] in "+-" and ":" not in value[-5:]:
value = value[:-2] + ":" + value[-2:]

return datetime.fromisoformat(value.replace("Z", "+00:00"))
except (ValueError, IndexError):
return None

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic in this function for parsing datetime strings is quite complex and can be difficult to follow. While it appears to handle the expected Jira formats, adding comments to explain each step would significantly improve readability and maintainability for future developers.

For example, explaining why milliseconds are stripped or why the timezone colon is being manually added would be very helpful.

- SIM103: Return condition directly instead of if/return True
- SIM102: Combine nested if statements with 'and'
The test was failing because the new source detection logic
requires at least one data source (GitHub token or Jira credentials).
Added environment variable mock to provide a valid token.
- Add timezone import for epoch datetime fallback
- Use epoch datetime as fallback when created/updated is None
- Cast node.get("text") to str to satisfy return type
- Add explicit type annotation for results dict in run_extraction
@codecov
Copy link

codecov bot commented Nov 28, 2025

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

ℹ️ You can also turn on project coverage checks and project coverage reporting on Pull Request comment

Thanks for integrating Codecov - We've got you covered ☂️

The implementation now supports both Jira Cloud and Server/Data Center:

- Cloud (API v3): GET /rest/api/3/search/jql with cursor-based pagination
- Server/DC (API v2): POST /rest/api/2/search with offset-based pagination

The API version is auto-detected from the URL (atlassian.net = Cloud).

Added tests for Server/DC path including pagination.
The run_extraction() function was never called - its logic was
duplicated inside main(). Removed the unused function to reduce
maintenance overhead and avoid inconsistencies.
When jira_projects.txt is missing or empty, user is now prompted with:
- [A] Analyze ALL accessible projects
- [S] Specify project keys manually (comma-separated)
- [L] Select from list by number
- [Q] Quit/Skip Jira extraction

Added interactive=False parameter for non-interactive/test use cases.

Added tests for all interactive prompt options.
…ojects

Replace direct print() calls with TerminalOutput.log() for consistency
with the rest of the CLI. Added optional 'output' parameter that falls
back to print() when not provided (for backward compatibility and tests).
…ersion

Explain why milliseconds are stripped and timezone colon is added when
converting Jira datetime format to Python fromisoformat() compatible format.
@amargiovanni amargiovanni merged commit f7870dd into main Nov 28, 2025
5 of 6 checks passed
@amargiovanni amargiovanni deleted the 002-jira-integration branch November 28, 2025 20:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants