Skip to content

Security: Oltrematica/github_analyzer

Security

SECURITY.md

Security Analysis Report

Application: GitHub Analyzer / DevAnalyzer Version: 2.0 Analysis Date: 2025-11-29 Python Version: 3.9+


Executive Summary

This document provides a comprehensive security analysis of the GitHub Analyzer application. The application is a CLI tool that:

  • Fetches data from GitHub and Jira REST APIs
  • Processes repository, commit, PR, and issue information
  • Exports analysis results to CSV files

Overall Security Posture: EXCELLENT

The codebase demonstrates security-aware design with defense-in-depth measures:

  • Proper credential handling via environment variables
  • Token masking in all logs and error messages
  • Whitelist-based input validation
  • HTTPS enforcement for all API connections
  • Protection against common injection attacks (command, path traversal, CSV formula)
  • Secure file permissions on output files

Table of Contents

  1. Credential & Secret Management
  2. Input Validation & Sanitization
  3. Network & API Security
  4. File Operations Security
  5. Error Handling & Information Disclosure
  6. Dependency Security
  7. Security Controls Summary
  8. Security Checklist

1. Credential & Secret Management

1.1 GitHub Token Handling

Location: src/github_analyzer/config/settings.py

Implementation:

# Token loaded from environment variable only
token = os.environ.get("GITHUB_TOKEN", "").strip()

Security Controls:

Control Status Description
Environment Variable Only Token loaded exclusively from GITHUB_TOKEN
Never Logged Token value never appears in any log output
Never in Error Messages mask_token() replaces token with [MASKED]
No File Storage Token never written to disk or config files
Memory Only Token exists only in memory during execution

Token Masking (src/github_analyzer/core/exceptions.py:124-137):

def mask_token(value: str) -> str:
    """Mask a token value for safe logging."""
    return "[MASKED]"  # Never reveal any part of the token

Defense-in-Depth URL Masking (src/github_analyzer/core/security.py:55-59):

# Pattern to match potential tokens in URLs
_TOKEN_PATTERN = re.compile(
    r"(ghp_[a-zA-Z0-9]+|gho_[a-zA-Z0-9]+|github_pat_[a-zA-Z0-9_]+|"
    r"[a-f0-9]{40}|Bearer\s+[^\s]+)",
    re.IGNORECASE,
)

Security Grade: A+

1.2 Jira Credentials Handling

Location: src/github_analyzer/config/settings.py, src/github_analyzer/api/jira_client.py

Credentials Involved:

  • JIRA_URL - Jira instance URL
  • JIRA_EMAIL - User email for authentication
  • JIRA_API_TOKEN - API token

Security Controls:

Control Status Description
Environment Variables All credentials from environment
Token Masking Masked in __repr__ and __str__ methods
Safe Serialization to_dict() returns masked token
HTTPS Only HTTP URLs rejected for Jira
Base64 Auth Basic Auth only over HTTPS

Security Grade: A+

1.3 Token Format Validation

Location: src/github_analyzer/config/validation.py:30-36

Validated Patterns:

TOKEN_PATTERNS = [
    r"^ghp_[a-zA-Z0-9]{20,}$",       # Classic PAT
    r"^github_pat_[a-zA-Z0-9_]{20,}$", # Fine-grained PAT
    r"^gho_[a-zA-Z0-9]{20,}$",       # OAuth
    r"^ghs_[a-zA-Z0-9]{20,}$",       # App token
    r"^ghr_[a-zA-Z0-9]{36,}$",       # Refresh token
]

Purpose: Format validation ensures tokens match expected GitHub patterns, preventing:

  • Configuration errors (wrong variable set)
  • Accidental exposure of unrelated secrets
  • Malformed token usage

Security Grade: A


2. Input Validation & Sanitization

2.1 Repository Name Validation

Location: src/github_analyzer/config/validation.py

Multi-Layer Validation:

Layer 1: Dangerous Character Detection (line 46-47):

DANGEROUS_CHARS = set(";|&$`(){}[]<>\\'\"\n\r\t")

Layer 2: Whitelist Pattern (line 42-43):

REPO_COMPONENT_PATTERN = r"^[a-zA-Z0-9.][a-zA-Z0-9._-]{0,99}$"
REPO_FULL_PATTERN = r"^[a-zA-Z0-9.][a-zA-Z0-9._-]{0,99}/[a-zA-Z0-9.][a-zA-Z0-9._-]{0,99}$"

Layer 3: Path Traversal Protection (line 223-228):

if ".." in owner or ".." in name:
    raise ValidationError(
        "Invalid repository: path traversal attempt detected",
        details="Repository names cannot contain '..'",
    )

Security Controls:

Control Status Description
Whitelist Approach Uses allowlist, not blocklist
Shell Metacharacter Rejection Explicit blocking of dangerous chars
Path Traversal Prevention .. sequences rejected
URL Normalization GitHub URLs validated and normalized
Length Limits Max 100 chars per component

Security Grade: A+

2.2 Jira Project Key Validation

Location: src/github_analyzer/config/validation.py:346

JIRA_PROJECT_KEY_PATTERN = r"^[A-Z][A-Z0-9_]*$"

Validation: Only uppercase letters, digits, and underscores starting with a letter.

Security Grade: A

2.3 URL Validation

Location: src/github_analyzer/config/validation.py:349-392

Jira URL Validation:

if parsed.scheme != "https":
    return False  # FR-019: HTTPS mandatory

Controls:

  • HTTPS Only: HTTP URLs rejected
  • Host Validation: Must have valid hostname with at least one dot
  • Dangerous Character Check: Applied to full URL

GitHub URL Normalization:

  • Validates against github.com or www.github.com
  • Extracts owner/repo from path
  • Handles .git suffix removal
  • Strips trailing slashes

Security Grade: A

2.4 ISO 8601 Date Validation

Location: src/github_analyzer/config/validation.py:425-481

Implementation: Validates date format and range (1900-2100) to prevent:

  • Injection via malformed dates
  • Integer overflow attacks
  • Timezone manipulation

Security Grade: A


3. Network & API Security

3.1 HTTPS Enforcement

API Enforcement Method
GitHub ✅ Hardcoded Base URL: https://api.github.com
Jira ✅ Validated validate_jira_url() rejects http://

Security Grade: A+

3.2 Authentication Headers

GitHub (src/github_analyzer/api/client.py:86-96):

def _get_headers(self) -> dict[str, str]:
    return {
        "Authorization": f"token {self._config.github_token}",
        "Accept": "application/vnd.github.v3+json",
        "User-Agent": "GitHub-Analyzer/2.0",
    }

Jira (src/github_analyzer/api/jira_client.py:149-164):

def _get_headers(self) -> dict[str, str]:
    credentials = f"{self.config.jira_email}:{self.config.jira_api_token}"
    encoded = base64.b64encode(credentials.encode()).decode()
    return {
        "Authorization": f"Basic {encoded}",
        ...
    }

Security Controls:

  • Credentials only in Authorization header
  • Not logged or exposed in errors
  • Standard authentication schemes

Security Grade: A

3.3 Rate Limiting Handling

Implementation:

  • Tracks X-RateLimit-Remaining and X-RateLimit-Reset
  • Raises dedicated RateLimitError / JiraRateLimitError
  • Displays wait time without exposing internal details

Security Grade: A

3.4 Retry Logic with Exponential Backoff

GitHub (src/github_analyzer/api/client.py):

# Only retry on 5xx errors
if e.status_code and 500 <= e.status_code < 600:
    wait_time = (2**attempt) * 0.5  # 0.5s, 1s, 2s
    time.sleep(wait_time)

Jira (src/github_analyzer/api/jira_client.py):

  • Max retries: 5
  • Initial delay: 1s
  • Max delay: 60s
  • Respects Retry-After header

Security Benefit: Protects against transient failures without overwhelming servers (prevents unintentional DoS).

Security Grade: A

3.5 Timeout Configuration

Both clients implement configurable timeouts (default 30s, max 300s):

timeout=self._config.timeout  # Used in all requests

Timeout Warning (src/github_analyzer/core/security.py:337-373):

def validate_timeout(timeout: int, logger=None, threshold=None):
    """Warn if timeout exceeds recommended threshold (default 60s)."""
    if timeout > threshold and logger:
        logger.warning(f"[SECURITY] Timeout of {timeout}s exceeds recommended threshold")

Security Grade: A

3.6 Content-Type Validation

Location: src/github_analyzer/core/security.py:233-285

def validate_content_type(headers, expected="application/json", logger=None):
    """Validate response Content-Type header."""
    if expected not in content_type:
        logger.warning(f"[SECURITY] Unexpected Content-Type: {content_type}")
        return False
    return True

Security Benefit: Detects content-type mismatch attacks and API response tampering.

Security Grade: A


4. File Operations Security

4.1 Output Path Validation

Location: src/github_analyzer/core/security.py:62-99

def validate_output_path(path: str | Path, base_dir: Path | None = None) -> Path:
    """Validate output path is within safe boundary."""
    resolved_base = base_dir.resolve()
    resolved_path = (resolved_base / Path(path)).resolve()

    # Check if path is within safe boundary (Python 3.9+)
    if not resolved_path.is_relative_to(resolved_base):
        raise ValidationError(f"Output path must be within {resolved_base}")

    return resolved_path

Security Controls:

Control Status Description
Symlink Resolution resolve() follows symlinks
Path Traversal Prevention is_relative_to() check
Base Directory Enforcement Paths must be within allowed directory

Security Grade: A

4.2 CSV Formula Injection Protection

Location: src/github_analyzer/core/security.py:102-154

# Formula injection triggers (=, +, -, @, TAB, CR)
FORMULA_TRIGGERS: frozenset[str] = frozenset("=+-@\t\r")

def escape_csv_formula(value: Any) -> str:
    """Escape cell value to prevent CSV formula injection."""
    str_value = str(value) if value is not None else ""
    if str_value and str_value[0] in FORMULA_TRIGGERS:
        return f"'{str_value}"  # Prefix with single quote
    return str_value

Example:

>>> escape_csv_formula("=SUM(A1:A10)")
"'=SUM(A1:A10)"
>>> escape_csv_formula("Normal text")
"Normal text"

OWASP Reference: CSV Injection

Security Grade: A

4.3 Secure File Permissions

Location: src/github_analyzer/core/security.py:207-230

# Default secure file permissions (owner read/write only)
DEFAULT_SECURE_MODE: int = 0o600

def set_secure_permissions(filepath: Path, mode: int = DEFAULT_SECURE_MODE) -> bool:
    """Set secure permissions on a file (Unix only)."""
    if platform.system() == "Windows":
        return True  # Different ACL model
    try:
        filepath.chmod(mode)
        return True
    except OSError:
        return False  # Graceful degradation

Security Grade: A

4.4 File Permission Checking

Location: src/github_analyzer/core/security.py:157-204

def check_file_permissions(filepath: Path, logger=None) -> bool:
    """Check if file has secure permissions (Unix only)."""
    is_world_readable = bool(mode & stat.S_IROTH)
    is_group_readable = bool(mode & stat.S_IRGRP)

    if is_world_readable or is_group_readable:
        logger.warning(f"[SECURITY] File '{filepath}' has permissive permissions")
        return False
    return True

Security Grade: A


5. Error Handling & Information Disclosure

5.1 Exception Hierarchy

Location: src/github_analyzer/core/exceptions.py

GitHubAnalyzerError (base)
├── ConfigurationError (exit code 1)
├── ValidationError (exit code 1)
└── APIError (exit code 2)
    └── RateLimitError (exit code 2)

JiraAPIError
├── JiraAuthenticationError (401)
├── JiraPermissionError (403)
├── JiraNotFoundError (404)
└── JiraRateLimitError (429)

Security Benefit: Well-structured hierarchy allows catching specific errors without exposing internal details.

Security Grade: A

5.2 Error Message Content

Security Controls:

Control Status Description
No Token in Errors Token values never appear
Response Truncation API responses truncated to 200 chars
Generic Auth Errors No credential hints in auth failures
No Stack Traces Internal traces not exposed to users

Example (client.py):

raise APIError(
    f"GitHub API error: HTTP {response.status_code}",
    details=response.text[:200] if response.text else None,  # Truncated!
    status_code=response.status_code,
)

Security Grade: A

5.3 Audit Logging

Location: src/github_analyzer/core/security.py:303-334

def log_api_request(method, url, status_code, logger, response_time_ms=None):
    """Log API request details (for verbose mode)."""
    # Mask any tokens that might appear in the URL (defense-in-depth)
    safe_url = _mask_url_tokens(url)
    logger.info(f"[API] {method} {safe_url} -> {status_code}")

Security Controls:

  • No secrets in logs
  • URLs sanitized before logging
  • Operation-only data logged
  • Standard Python logging module

Security Grade: A


6. Dependency Security

6.1 External Dependencies

Required:

  • Python 3.9+ (standard library only)

Optional:

  • requests - HTTP client (falls back to urllib if not available)

Security Analysis:

Aspect Assessment
Dependency Count Minimal - Near-zero attack surface
Graceful Fallback ✅ stdlib fallback when requests unavailable
Known Vulnerabilities None (stdlib only)
Supply Chain Risk Low

Security Grade: A

6.2 Development Dependencies

Listed in requirements-dev.txt:

  • pytest
  • pytest-cov
  • ruff
  • mypy

Note: These are development-only and not required for production use.


7. Security Controls Summary

7.1 OWASP Top 10 Coverage

OWASP Category Status Implementation
A01 Broken Access Control Token-based auth, HTTPS enforcement
A02 Cryptographic Failures No custom crypto, uses standard auth
A03 Injection Input validation, parameterized queries
A04 Insecure Design Defense-in-depth architecture
A05 Security Misconfiguration Secure defaults, timeout warnings
A06 Vulnerable Components Minimal dependencies
A07 Auth Failures Token masking, no credential storage
A08 Data Integrity CSV formula injection protection
A09 Logging Failures Secure logging, no secrets in logs
A10 SSRF URL validation, host restrictions

7.2 Attack Surface Analysis

Attack Vector Mitigation
Command Injection Shell metacharacter rejection
Path Traversal .. detection, is_relative_to() check
CSV Formula Injection Single-quote prefix for trigger chars
Credential Exposure Environment variables, masking
Man-in-the-Middle HTTPS enforcement
DoS via Timeouts Configurable timeouts with warnings
Information Disclosure Response truncation, no stack traces

8. Security Checklist

Authentication & Authorization

  • Credentials loaded from environment variables
  • Tokens never logged or exposed in errors
  • Token format validation before use
  • Secure token masking in all representations
  • Basic Auth only over HTTPS

Input Validation

  • Whitelist validation patterns
  • Dangerous character rejection
  • Path traversal prevention
  • URL scheme validation (HTTPS enforced)
  • Maximum length enforcement

Network Security

  • HTTPS enforced for all API calls
  • Timeout configuration with warnings
  • Rate limit handling
  • Retry with exponential backoff
  • Content-Type validation
  • Proper error handling for network failures

Output Security

  • CSV formula injection protection
  • Path validation for output files
  • Secure file permissions (0o600)
  • Symlink resolution before writes

Data Protection

  • No sensitive data in logs
  • Error messages sanitized
  • CSV properly escaped via csv module
  • UTF-8 encoding enforced

Code Quality

  • Type hints throughout
  • Structured exception handling
  • Resource cleanup with context managers
  • No eval/exec usage
  • No shell command injection vectors

Appendix A: Files Analyzed

File Lines Security Relevance
src/github_analyzer/core/security.py 374 Critical - Security utilities
src/github_analyzer/config/settings.py 400+ Critical - Credential handling
src/github_analyzer/config/validation.py 526 Critical - Input validation
src/github_analyzer/api/client.py 550+ High - GitHub API communication
src/github_analyzer/api/jira_client.py 650+ High - Jira API communication
src/github_analyzer/core/exceptions.py 241 Medium - Error handling
src/github_analyzer/exporters/csv_exporter.py 400+ Medium - File operations
src/github_analyzer/exporters/jira_exporter.py 200+ Medium - File operations

Appendix B: Environment Variables

Variable Purpose Security Notes
GITHUB_TOKEN GitHub API authentication Never logged, masked
JIRA_URL Jira instance URL HTTPS enforced
JIRA_EMAIL Jira auth email Logged in error context only
JIRA_API_TOKEN Jira API token Never logged, masked
GITHUB_ANALYZER_TIMEOUT_WARN_THRESHOLD Timeout warning threshold Optional, defaults to 60s

Appendix C: Security Headers

Header Usage
Authorization Token/Basic Auth transmission
X-RateLimit-Remaining Rate limit tracking
X-RateLimit-Reset Rate limit reset timestamp
Retry-After Retry timing (429 responses)
Content-Type Response format verification

Revision History

Date Version Changes
2025-11-29 1.0 Initial security analysis
2025-11-29 1.1 Added CSV formula injection, path validation, file permissions

This security analysis was performed based on static code review. Dynamic testing (penetration testing, fuzzing) is recommended for production deployments.

There aren’t any published security advisories