Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ Auto-generated from all feature plans. Last updated: 2025-11-28
## Active Technologies
- Python 3.9+ (per constitution, leveraging type hints) + Standard library (urllib, json, csv, os, re); optional: requests (002-jira-integration)
- CSV files for export (same as existing GitHub exports) (002-jira-integration)
- Python 3.9+ (per constitution, leveraging type hints) + Standard library only (urllib, json, csv, os, re, datetime, statistics); optional: requests (already used in jira_client.py) (003-jira-quality-metrics)

- Python 3.9+ (as per constitution, leveraging type hints) + Standard library only (urllib, json, csv, os, re); optional: requests (001-modular-refactor)

Expand Down Expand Up @@ -36,6 +37,7 @@ python github_analyzer.py --days 7
Python 3.9+ (as per constitution, leveraging type hints): Follow standard conventions

## Recent Changes
- 003-jira-quality-metrics: Added Python 3.9+ (per constitution, leveraging type hints) + Standard library only (urllib, json, csv, os, re, datetime, statistics); optional: requests (already used in jira_client.py)
- 002-jira-integration: Added Python 3.9+ (per constitution, leveraging type hints) + Standard library (urllib, json, csv, os, re); optional: requests

- 001-modular-refactor: Added Python 3.9+ (as per constitution, leveraging type hints) + Standard library only (urllib, json, csv, os, re); optional: requests
Expand Down
86 changes: 81 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,11 +20,14 @@ A powerful Python command-line tool for analyzing GitHub repositories and Jira p
- **Quality Metrics** - Assess code quality through revert ratios, review coverage, and commit message analysis
- **Productivity Scoring** - Calculate composite productivity scores for contributors across repositories

### Jira Integration (NEW)
### Jira Integration
- **Jira Issue Extraction** - Extract issues and comments from Jira Cloud and Server/Data Center
- **Quality Metrics** - Calculate 10 quality metrics per issue including cycle time, description quality, collaboration scores
- **Aggregated Reports** - Generate project-level, person-level, and issue-type summaries
- **Multi-Project Support** - Analyze multiple Jira projects with interactive project selection
- **Time-Based Filtering** - Filter issues by update date using JQL queries
- **Comment Tracking** - Export all issue comments with author and timestamp
- **Reopen Tracking** - Detect reopened issues via changelog API (best-effort, graceful degradation)
- **ADF Support** - Automatically converts Atlassian Document Format to plain text

### Core Features
Expand Down Expand Up @@ -222,12 +225,15 @@ The analyzer generates CSV files in the output directory. GitHub outputs are alw
| `productivity_analysis.csv` | Per-contributor productivity metrics and scores |
| `contributors_summary.csv` | Contributor overview with commit and PR statistics |

**Jira outputs (2 files):**
**Jira outputs (5 files):**

| File | Description |
|------|-------------|
| `jira_issues_export.csv` | Jira issues with key, summary, status, type, priority, assignee, reporter, dates |
| `jira_issues_export.csv` | Jira issues with 12 base fields + 10 quality metrics per issue |
| `jira_comments_export.csv` | Jira issue comments with issue key, author, date, body |
| `jira_project_metrics.csv` | Aggregated metrics per project (cycle time, bug ratio, quality scores) |
| `jira_person_metrics.csv` | Per-assignee metrics (WIP count, resolved count, avg cycle time) |
| `jira_type_metrics.csv` | Per-issue-type metrics (counts, avg cycle time, bug resolution time) |

### CSV Field Details

Expand Down Expand Up @@ -267,9 +273,79 @@ first_activity, last_activity, active_days, consistency_pct,
productivity_score
```

## Quality Metrics Explained
#### jira_issues_export.csv
```
key, summary, description, status, issue_type, priority, assignee, reporter,
created, updated, resolution_date, project_key,
cycle_time_days, aging_days, comments_count, description_quality_score,
acceptance_criteria_present, comment_velocity_hours, silent_issue,
same_day_resolution, cross_team_score, reopen_count
```

#### jira_project_metrics.csv
```
project_key, total_issues, resolved_count, unresolved_count,
avg_cycle_time_days, median_cycle_time_days, bug_count, bug_ratio_percent,
same_day_resolution_rate_percent, avg_description_quality,
silent_issues_ratio_percent, avg_comments_per_issue,
avg_comment_velocity_hours, reopen_rate_percent
```

#### jira_person_metrics.csv
```
assignee_name, wip_count, resolved_count, total_assigned,
avg_cycle_time_days, bug_count_assigned
```

#### jira_type_metrics.csv
```
issue_type, count, resolved_count, avg_cycle_time_days,
bug_resolution_time_avg
```

The analyzer calculates several quality indicators:
## Jira Quality Metrics Explained

The analyzer calculates 10 quality metrics for each Jira issue:

| Metric | Description | Value |
|--------|-------------|-------|
| **cycle_time_days** | Days from created to resolution | Float (empty if open) |
| **aging_days** | Days since creation for open issues | Float (empty if resolved) |
| **comments_count** | Total number of comments | Integer |
| **description_quality_score** | Quality score based on length, AC, formatting | 0-100 |
| **acceptance_criteria_present** | AC patterns detected (Given/When/Then, checkboxes) | true/false |
| **comment_velocity_hours** | Hours from creation to first comment | Float (empty if silent) |
| **silent_issue** | No comments exist | true/false |
| **same_day_resolution** | Resolved on same day as created | true/false |
| **cross_team_score** | Collaboration score based on distinct commenters | 0-100 |
| **reopen_count** | Times reopened (Done→non-Done transitions) | Integer |

### Description Quality Score Calculation

The quality score (0-100) uses balanced weighting:

| Component | Weight | Criteria |
|-----------|--------|----------|
| **Length** | 40% | Linear scale: 100+ chars = full score |
| **Acceptance Criteria** | 40% | Detected patterns: Given/When/Then, AC:, checkboxes |
| **Formatting** | 20% | Headers (10%) + Lists (10%) detected |

### Cross-Team Collaboration Score

Based on distinct comment authors:

| Authors | Score |
|---------|-------|
| 0 | 0 |
| 1 | 25 |
| 2 | 50 |
| 3 | 75 |
| 4 | 90 |
| 5+ | 100 |

## GitHub Quality Metrics Explained

The analyzer calculates several quality indicators for GitHub repositories:

| Metric | Description | Ideal |
|--------|-------------|-------|
Expand Down
158 changes: 158 additions & 0 deletions specs/003-jira-quality-metrics/checklists/comprehensive.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@
# Requirements Quality Checklist: Jira Quality Metrics Export

**Purpose**: Comprehensive requirements quality validation for PR review
**Created**: 2025-11-28
**Feature**: [spec.md](../spec.md)
**Depth**: Standard (PR Review)
**Audience**: Reviewer

---

## Algorithm/Calculation Requirements

- [x] CHK001 - Is the linear interpolation for length score (0-100 chars → 0-40 points) explicitly defined? [Clarity, Spec §FR-004]
- ✓ Defined in research.md §3: `length_score = min(40, int(length * 40 / 100))` and data-model.md §Configuration
- [x] CHK002 - Are the exact regex patterns for acceptance criteria detection documented? [Completeness, Spec §FR-005]
- ✓ Defined in research.md §2 and data-model.md §Configuration Constants: 5 AC_PATTERNS listed
- [x] CHK003 - Is the diminishing scale for cross_team_score (1=25, 2=50, 3=75, 4=90, 5+=100) applied to 0 authors? [Gap, Spec §FR-009]
- ✓ Defined in research.md §4: returns 0 for empty comments list
- [x] CHK004 - Is "same day" for same_day_resolution defined in terms of timezone handling? [Ambiguity, Spec §FR-008]
- ✓ Implicitly: comparison uses created and resolution_date which are ISO8601 with timezone (contracts/csv-schemas.md)
- [x] CHK005 - Is the precision for cycle_time_days calculation specified (integer days vs fractional)? [Clarity, Spec §FR-001]
- ✓ data-model.md: `cycle_time_days: float | None`; contracts: 2 decimal places
- [x] CHK006 - Are the formatting detection patterns (headers/lists) for quality score explicitly listed? [Completeness, Spec §FR-004]
- ✓ Defined in research.md §3: `r'^#+\s'` for headers, `r'^\s*[-*]\s'` for lists
- [x] CHK007 - Is the calculation formula for avg_comment_velocity_hours specified (excludes silent issues)? [Clarity, Spec §FR-014]
- ✓ data-model.md §ProjectMetrics: "Mean comment_velocity for non-silent issues"

---

## Data Model & Schema Requirements

- [x] CHK008 - Is the relationship between cycle_time_days and aging_days mutual exclusion clearly stated? [Consistency, Data Model]
- ✓ data-model.md §IssueMetrics: "cycle_time_days: Days from created to resolution (None if unresolved)" and "aging_days: Days from created to now (None if resolved)"
- [x] CHK009 - Are all 10 new CSV columns documented with exact data types and formats? [Completeness, Contracts §1]
- ✓ contracts/csv-schemas.md §1 Schema table: all 10 columns with Type, Description, Example
- [x] CHK010 - Is the handling of None/null values in CSV export explicitly defined (empty string vs "null")? [Clarity, Contracts §1 Notes]
- ✓ contracts/csv-schemas.md §1 Notes: "Empty values: empty string (not `null` or `N/A`)"
- [x] CHK011 - Are float precision requirements (2 decimal places) consistent across all metrics? [Consistency, Contracts]
- ✓ contracts/csv-schemas.md §Validation Rules: "Floats: 2 decimal places, no thousands separator"
- [x] CHK012 - Is the boolean format ("true"/"false" lowercase) requirement applied to all boolean fields? [Consistency, Contracts §1 Notes]
- ✓ contracts/csv-schemas.md §1 Notes + §Validation Rules: "Booleans: `true` or `false` (lowercase)"
- [x] CHK013 - Are PersonMetrics validation rules for empty assignee_name consistent with edge case handling? [Consistency, Data Model]
- ✓ data-model.md §PersonMetrics: "assignee_name: Must not be empty string; issues without assignee are excluded" + spec.md §Edge Cases
- [x] CHK014 - Is the constraint `wip_count + resolved_count = total_assigned` documented in both data model and spec? [Traceability, Data Model]
- ✓ data-model.md §PersonMetrics Validation Rules: "`wip_count` + `resolved_count` = `total_assigned`"

---

## Edge Case & Error Handling Requirements

- [x] CHK015 - Is negative cycle_time handling (warning + null) behavior testable with specific criteria? [Measurability, Spec §Edge Cases]
- ✓ spec.md §Edge Cases: "System logs a warning and sets cycle_time to null"; data-model.md §Validation: "negative values logged as warning and set to None"
- [x] CHK016 - Are division-by-zero scenarios explicitly enumerated for all ratio calculations? [Coverage, Spec §Edge Cases]
- ✓ spec.md §Edge Cases: "Return 0% or null with appropriate note"; data-model.md §ProjectMetrics: "Division by zero: Return 0.0 for ratios, None for averages"
- [x] CHK017 - Is the behavior for issues with resolution_date but no created date defined? [Gap]
- ✓ spec.md §Assumptions: "Creation and resolution dates are always present and valid for resolved issues" - invalid data is out of scope
- [x] CHK018 - Are requirements for handling malformed ADF descriptions specified? [Gap, Spec §Edge Cases]
- ✓ spec.md §Edge Cases: "How is description quality score calculated for ADF format descriptions? → Convert to plain text first, then analyze"
- [x] CHK019 - Is the fallback behavior when changelog API returns 403/404 documented? [Completeness, Spec §Assumptions]
- ✓ research.md §1: "If API returns 403 (permissions) or 404, reopen_count defaults to 0 without error"; tasks.md T040 tests this
- [x] CHK020 - Are requirements defined for issues with future dates (created > now)? [Gap, Edge Case]
- ✓ Handled by CHK015 - negative cycle_time results in warning + null; aging_days would be negative → same treatment
- [x] CHK021 - Is handling of issues re-assigned during analysis period specified for PersonMetrics? [Gap]
- ✓ Implicitly: PersonMetrics aggregates by current assignee field value at export time (snapshot-based, not historical)

---

## API Integration Requirements

- [x] CHK022 - Is the Jira changelog API endpoint path explicitly documented for v2 and v3? [Completeness, Gap]
- ✓ research.md §1: "GET /rest/api/{version}/issue/{issueKey}/changelog" - version parameterized
- [x] CHK023 - Are retry/timeout requirements for changelog API calls specified? [Gap]
- ✓ Inherits from constitution §API Client Standards: "configurable timeouts (default: 30s)", "exponential backoff for transient failures"
- [x] CHK024 - Is the "Done" status list (Done, Closed, Resolved, Complete, Completed) configurable or hardcoded? [Clarity, Data Model §Configuration]
- ✓ data-model.md §Configuration Constants: `DONE_STATUSES = {'Done', 'Closed', 'Resolved', 'Complete', 'Completed'}` - constant (config changeable)
- [x] CHK025 - Are Jira API version differences for changelog response format documented? [Gap]
- ✓ research.md §1: "Both Cloud (v3) and Server (v2) support this endpoint" - same structure
- [x] CHK026 - Is graceful degradation behavior when comments API fails specified? [Gap]
- ✓ spec.md §Assumptions: "Comments are already retrieved by the existing system (JiraClient.get_comments already implemented)" - existing error handling applies

---

## Acceptance Criteria Quality

- [x] CHK027 - Is "high" description_quality_score (>70) in US1 acceptance scenario objectively measurable? [Measurability, Spec §US1.4]
- ✓ spec.md §US1.4: "description_quality_score field is high (>70)" - explicit threshold
- [x] CHK028 - Is "high" cross_team_score in US5 acceptance scenario quantified with specific threshold? [Ambiguity, Spec §US5.1]
- ✓ spec.md §US5.1: "cross-team collaboration score is ≥75" - explicit threshold (was fixed in analysis phase)
- [x] CHK029 - Are success criteria SC-001 through SC-006 all independently testable? [Measurability, Spec §Success Criteria]
- ✓ All 6 SCs have measurable outcomes: time-based (SC-001), count-based (SC-002, SC-003), percentage (SC-004), performance (SC-005), boolean (SC-006)
- [x] CHK030 - Is SC-004 (80% identification rate) validation methodology defined? [Clarity, Spec §SC-004]
- ✓ spec.md §SC-004: "identifies at least 80% of issues with insufficient description (< 50 characters or no AC)" - criteria defined
- [x] CHK031 - Is SC-005 (200% performance threshold) baseline measurement specified? [Clarity, Spec §SC-005]
- ✓ spec.md §SC-005: "does not exceed 200% of current base export time" - baseline = current export without metrics

---

## Scenario Coverage

- [x] CHK032 - Are requirements defined for multi-project export scenarios? [Coverage, Gap]
- ✓ contracts/csv-schemas.md §2 Example: shows 2 projects (PROJ, DEV) in single file; ProjectMetrics aggregates by project_key
- [x] CHK033 - Are requirements for empty project (0 issues) export behavior specified? [Coverage, Edge Case]
- ✓ data-model.md §ProjectMetrics Validation: "Division by zero: Return 0.0 for ratios, None for averages" - handles empty projects
- [x] CHK034 - Are requirements for project with all unresolved issues specified for cycle_time aggregation? [Coverage, Edge Case]
- ✓ data-model.md §ProjectMetrics: "avg_cycle_time_days: float | None" + Validation: "None if no data points available"
- [x] CHK035 - Is the behavior for person with 0 resolved issues (avg_cycle_time = null) documented? [Coverage, Data Model]
- ✓ data-model.md §PersonMetrics: "avg_cycle_time_days: float | None" - None for 0 resolved issues
- [x] CHK036 - Are requirements defined for issue types not in standard set (Bug, Story, Task)? [Coverage, Spec §FR-019]
- ✓ spec.md §FR-019: "per issue_type" - aggregates all types; contracts/csv-schemas.md §4 Example shows Epic type

---

## Non-Functional Requirements

- [x] CHK037 - Is the performance target (≤200% of base export time) measurable with specific test methodology? [Measurability, Spec §SC-005]
- ✓ spec.md §SC-005: baseline is "current base export time"; test compares with/without metrics
- [x] CHK038 - Are memory constraints for large dataset processing specified? [Gap, Non-Functional]
- ✓ plan.md §Technical Context: "Handles typical Jira project sizes (hundreds to thousands of issues)" - streaming not required for expected scale
- [x] CHK039 - Is logging level/format for negative cycle_time warnings specified? [Gap, Spec §Edge Cases]
- ✓ constitution §V: "All errors MUST be logged with context (repo, operation, timestamp)" - follows standard logging
- [x] CHK040 - Are CSV file encoding requirements (UTF-8) explicitly documented? [Completeness, Contracts §Validation Rules]
- ✓ contracts/csv-schemas.md §Validation Rules: "Encoding: UTF-8"

---

## Dependencies & Assumptions

- [x] CHK041 - Is the assumption "JiraClient.get_comments already implemented" validated against current codebase? [Assumption, Spec §Assumptions]
- ✓ spec.md §Assumptions: documented as assumption; research.md §4 confirms "JiraClient.get_comments exists"
- [x] CHK042 - Is the assumption "creation and resolution dates always present" contradicted by edge case handling? [Conflict, Spec §Assumptions vs Edge Cases]
- ✓ No conflict: assumption is "for resolved issues"; edge case §CHK015 handles anomalies with warning+null
- [x] CHK043 - Are external dependencies (Jira API version, authentication) documented in plan.md? [Traceability]
- ✓ plan.md §Technical Context: "uses existing JiraClient auth"; research.md §1: "Both Cloud (v3) and Server (v2)"
- [x] CHK044 - Is backward compatibility requirement for existing CSV consumers explicitly stated? [Gap]
- ✓ research.md §6: "Adding columns at end preserves positional parsing"; plan.md §Constraints: "maintain backwards compatibility"

---

## Traceability & Consistency

- [x] CHK045 - Do all 23 functional requirements (FR-001 to FR-023) have corresponding acceptance scenarios? [Traceability]
- ✓ US1 covers FR-001 to FR-009 (issue metrics); US2 covers FR-010 to FR-014 (project); US3 covers FR-015 to FR-018 (person); US4 covers FR-019 to FR-021 (type); US5 covers FR-022 to FR-023 (reopen)
- [x] CHK046 - Is FR-023 (reopen_rate_percent) included in ProjectMetrics CSV schema? [Consistency, Contracts §2 vs Spec §FR-023]
- ✓ contracts/csv-schemas.md §2 Schema: "reopen_rate_percent | float | Reopen percentage | 5.00"
- [x] CHK047 - Are entity field names in data-model.md consistent with CSV column names in contracts? [Consistency]
- ✓ Verified: all field names match (e.g., cycle_time_days, description_quality_score, cross_team_score)
- [x] CHK048 - Is the "reopen_count" field present in both IssueMetrics and extended issue CSV schema? [Consistency]
- ✓ data-model.md §IssueMetrics: "reopen_count: int"; contracts/csv-schemas.md §1: "reopen_count | int | Times reopened"

---

## Notes

- Items marked [Gap] indicate potentially missing requirements
- Items marked [Ambiguity] indicate vague language needing quantification
- Items marked [Conflict] indicate potential contradictions between sections
- Items marked [Consistency] require cross-referencing multiple spec sections
- Focus: Algorithm accuracy, data integrity, API robustness, schema completeness
Loading