Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@
Auto-generated from all feature plans. Last updated: 2025-11-28

## Active Technologies
- Python 3.9+ (per constitution, leveraging type hints) + Standard library (urllib, json, csv, os, re); optional: requests (002-jira-integration)
- CSV files for export (same as existing GitHub exports) (002-jira-integration)

- Python 3.9+ (as per constitution, leveraging type hints) + Standard library only (urllib, json, csv, os, re); optional: requests (001-modular-refactor)

Expand Down Expand Up @@ -34,6 +36,7 @@ python github_analyzer.py --days 7
Python 3.9+ (as per constitution, leveraging type hints): Follow standard conventions

## Recent Changes
- 002-jira-integration: Added Python 3.9+ (per constitution, leveraging type hints) + Standard library (urllib, json, csv, os, re); optional: requests

- 001-modular-refactor: Added Python 3.9+ (as per constitution, leveraging type hints) + Standard library only (urllib, json, csv, os, re); optional: requests

Expand Down
8 changes: 4 additions & 4 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,8 +112,8 @@ pytest tests/ -v
# Run with coverage
pytest --cov=src/github_analyzer --cov-report=term-missing

# Check coverage meets threshold (95%)
pytest --cov=src/github_analyzer --cov-fail-under=95
# Check coverage meets threshold (90%)
pytest --cov=src/github_analyzer --cov-fail-under=90

# Run linter
ruff check src/github_analyzer/
Expand Down Expand Up @@ -267,7 +267,7 @@ class TestCommitAnalyzer:

### Test Requirements

- **Coverage**: Minimum 95% code coverage
- **Coverage**: Minimum 90% code coverage
- **Unit tests**: All new code must have tests
- **Mocking**: Mock external dependencies (GitHub API, file system)
- **Fixtures**: Use pytest fixtures for reusable test data
Expand Down Expand Up @@ -351,7 +351,7 @@ BREAKING CHANGE: The commits endpoint now returns a different structure.
- [ ] Tests pass locally (`pytest tests/ -v`)
- [ ] Linter passes (`ruff check src/github_analyzer/`)
- [ ] Type checker passes (`mypy src/github_analyzer/`)
- [ ] Coverage is ≥95%
- [ ] Coverage is ≥90%
- [ ] Documentation is updated (if applicable)
- [ ] Commit messages follow conventions

Expand Down
109 changes: 103 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,25 +1,37 @@
# GitHub Analyzer
# DevAnalyzer (GitHub Analyzer)

[![Tests](https://github.com/Oltrematica/github_analyzer/actions/workflows/tests.yml/badge.svg)](https://github.com/Oltrematica/github_analyzer/actions/workflows/tests.yml)
[![codecov](https://codecov.io/gh/Oltrematica/github_analyzer/branch/main/graph/badge.svg)](https://codecov.io/gh/Oltrematica/github_analyzer)
[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

A powerful Python command-line tool for analyzing GitHub repositories and extracting comprehensive metrics about commits, pull requests, issues, and contributor activity. Generate detailed CSV reports for productivity analysis and code quality assessment.
A powerful Python command-line tool for analyzing GitHub repositories and Jira projects, extracting comprehensive metrics about commits, pull requests, issues, and contributor activity. Generate detailed CSV reports for productivity analysis and code quality assessment.

![GitHub Analyzer Banner](screens/screen1.png)

## Features

### GitHub Analysis
- **Commit Analysis** - Track commits with detailed statistics including additions, deletions, merge detection, and revert identification
- **Pull Request Metrics** - Monitor PR workflow, merge times, review coverage, and approval rates
- **Issue Tracking** - Analyze issue resolution times, categorization (bugs vs enhancements), and closure rates
- **Contributor Insights** - Identify top contributors with activity metrics and productivity scoring
- **Multi-Repository Support** - Analyze multiple repositories in a single run with aggregated statistics
- **Quality Metrics** - Assess code quality through revert ratios, review coverage, and commit message analysis
- **Productivity Scoring** - Calculate composite productivity scores for contributors across repositories

### Jira Integration (NEW)
- **Jira Issue Extraction** - Extract issues and comments from Jira Cloud and Server/Data Center
- **Multi-Project Support** - Analyze multiple Jira projects with interactive project selection
- **Time-Based Filtering** - Filter issues by update date using JQL queries
- **Comment Tracking** - Export all issue comments with author and timestamp
- **ADF Support** - Automatically converts Atlassian Document Format to plain text

### Core Features
- **Multi-Source CLI** - Use `--sources` flag to select GitHub, Jira, or both
- **Auto-Detection** - Automatically detects available sources from environment credentials
- **Zero Dependencies** - Works with Python standard library only (optional `requests` for better performance)
- **Secure Token Handling** - Token loaded from environment variable, never exposed in logs or error messages
- **Secure Token Handling** - Tokens loaded from environment variables, never exposed in logs or error messages

## Requirements

Expand Down Expand Up @@ -123,14 +135,40 @@ The tool shows real-time progress with detailed information for each repository:

### Environment Variables

**GitHub Configuration:**

| Variable | Required | Default | Description |
|----------|----------|---------|-------------|
| `GITHUB_TOKEN` | **Yes** | - | GitHub Personal Access Token |
| `GITHUB_TOKEN` | **Yes*** | - | GitHub Personal Access Token |
| `GITHUB_ANALYZER_DAYS` | No | 30 | Number of days to analyze |
| `GITHUB_ANALYZER_OUTPUT_DIR` | No | `github_export` | Output directory for CSV files |
| `GITHUB_ANALYZER_REPOS_FILE` | No | `repos.txt` | Repository list file |
| `GITHUB_ANALYZER_VERBOSE` | No | `true` | Enable detailed logging |

**Jira Configuration:**

| Variable | Required | Default | Description |
|----------|----------|---------|-------------|
| `JIRA_URL` | **Yes*** | - | Jira instance URL (e.g., `https://company.atlassian.net`) |
| `JIRA_EMAIL` | **Yes*** | - | Jira account email |
| `JIRA_API_TOKEN` | **Yes*** | - | Jira API token |

*Required only if using that source. Auto-detection skips sources without credentials.

### How to Generate a Jira API Token

**For Jira Cloud (Atlassian Cloud):**
1. Go to https://id.atlassian.com/manage-profile/security/api-tokens
2. Click **"Create API token"**
3. Give it a descriptive name (e.g., "dev-analyzer")
4. Click **"Create"** and copy the token immediately (shown only once!)

**For Jira Server / Data Center:**
1. Go to **Profile** → **Personal Access Tokens**
2. Click **"Create token"**
3. Select appropriate permissions and create
4. Copy the generated token

**Note:** CLI arguments override environment variables.

### repos.txt Format
Expand All @@ -149,9 +187,28 @@ astral-sh/ruff
# Duplicates are automatically removed
```

### jira_projects.txt Format

```txt
# Add Jira project keys to analyze (one per line)
# Project keys are case-sensitive (usually uppercase)

PROJ
DEV
OPS

# Lines starting with # are comments
# Empty lines are ignored
# Duplicates are automatically removed
```

If this file is missing, the tool will prompt you interactively to select from available projects.

## Output Files

The analyzer generates 7 CSV files in the output directory:
The analyzer generates CSV files in the output directory. GitHub outputs are always generated when analyzing GitHub, and Jira outputs when analyzing Jira:

**GitHub outputs (7 files):**

![Analysis Summary](screens/screen3.png)

Expand All @@ -165,6 +222,13 @@ The analyzer generates 7 CSV files in the output directory:
| `productivity_analysis.csv` | Per-contributor productivity metrics and scores |
| `contributors_summary.csv` | Contributor overview with commit and PR statistics |

**Jira outputs (2 files):**

| File | Description |
|------|-------------|
| `jira_issues_export.csv` | Jira issues with key, summary, status, type, priority, assignee, reporter, dates |
| `jira_comments_export.csv` | Jira issue comments with issue key, author, date, body |

### CSV Field Details

#### commits_export.csv
Expand Down Expand Up @@ -341,6 +405,39 @@ export GITHUB_TOKEN=ghp_your_token_here
- Verify repository names in `repos.txt` are correct
- Ensure the token has read access to the repositories

### "JIRA_URL environment variable not set"
```bash
export JIRA_URL="https://yourcompany.atlassian.net"
export JIRA_EMAIL="your.email@company.com"
export JIRA_API_TOKEN="your-api-token"
```

### "Jira authentication failed"
- Verify your email matches your Jira account exactly
- Check that the API token is valid and not expired
- For Jira Cloud, ensure you're using the correct email (not username)
- For Jira Server/Data Center, verify the token has appropriate permissions

### "Jira project not found: PROJ"
- Project keys are case-sensitive (usually uppercase)
- Verify you have access to the project with your account
- Check the project key in Jira (visible in issue keys like PROJ-123)

### "Jira rate limit exceeded"
- The tool automatically retries with exponential backoff
- If persistent, wait a few minutes and retry
- Reduce the number of projects in `jira_projects.txt`
- Use a shorter analysis period with `--days`

### Jira skipped (no credentials)
- This is expected if you only have GitHub configured
- To use Jira, set all three required environment variables: `JIRA_URL`, `JIRA_EMAIL`, `JIRA_API_TOKEN`

### Empty Jira CSV files
- Check if projects have issues updated in the specified period
- Verify project keys in `jira_projects.txt` are correct
- Ensure your account has permission to view the projects

## Security

- **Token Security**: The GitHub token is loaded from the `GITHUB_TOKEN` environment variable and is never stored, logged, or exposed in error messages
Expand Down Expand Up @@ -373,7 +470,7 @@ pytest tests/ -v
ruff check src/github_analyzer/
```

We aim for **≥95% test coverage**. Open an issue for discussion before starting major changes.
We aim for **≥90% test coverage**. Open an issue for discussion before starting major changes.

## License

Expand Down
32 changes: 32 additions & 0 deletions dev_analyzer.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
#!/usr/bin/env python3
"""DevAnalyzer - Multi-platform development data extraction tool.

This is the primary entry point for analyzing GitHub repositories and Jira projects.
Supports multiple data sources with auto-detection of available credentials.

Usage:
python dev_analyzer.py --sources auto --days 7
python dev_analyzer.py --sources github --days 14
python dev_analyzer.py --sources jira --days 30
python dev_analyzer.py --sources github,jira --output ./reports

Environment Variables:
GitHub:
GITHUB_TOKEN: GitHub Personal Access Token (required for GitHub)

Jira:
JIRA_URL: Jira instance URL (e.g., https://company.atlassian.net)
JIRA_EMAIL: User email for authentication
JIRA_API_TOKEN: Jira API token

For more information, run with --help.
"""

from __future__ import annotations

import sys

from src.github_analyzer.cli.main import main

if __name__ == "__main__":
sys.exit(main())
6 changes: 5 additions & 1 deletion github_analyzer.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@
"""GitHub Repository Analyzer - Backward Compatible Entry Point.

This script provides backward compatibility with the original
github_analyzer.py interface while using the new modular architecture.
github_analyzer.py interface. The recommended entry point is now
dev_analyzer.py which supports multiple data sources.

For the new modular API, use:
from src.github_analyzer.cli import main
Expand All @@ -14,6 +15,9 @@
Set GITHUB_TOKEN environment variable, then run:
$ python github_analyzer.py

For multi-source analysis, use dev_analyzer.py instead:
$ python dev_analyzer.py --sources github,jira --days 7

Output:
- commits_export.csv: All commits from all repositories
- pull_requests_export.csv: All PRs from all repositories
Expand Down
26 changes: 26 additions & 0 deletions jira_projects.txt.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Jira Projects Configuration
#
# List Jira project keys to analyze, one per line.
# Lines starting with '#' are comments and will be ignored.
# Empty lines are also ignored.
#
# Example project keys:
# PROJ
# DEV
# OPS
#
# To use this file:
# 1. Copy to jira_projects.txt: cp jira_projects.txt.example jira_projects.txt
# 2. Replace example keys with your actual Jira project keys
# 3. Set the required environment variables:
# - JIRA_URL: Your Jira instance URL (e.g., https://company.atlassian.net)
# - JIRA_EMAIL: Your Jira account email
# - JIRA_API_TOKEN: Your Jira API token
#
# Alternatively, run the analyzer without this file to interactively
# select projects from those available in your Jira instance:
# python dev_analyzer.py --sources jira
#
# The project keys below are examples - replace them with your own:
# PROJ
# DEV
Loading