Skip to content

truconsent/truscanner

Repository files navigation

truScanner from truConsent

PyPI version License

Open-Source Static Analysis for Privacy Data Flows

truScanner is a static code analysis tool designed to discover and analyze personal data elements in your source code. It helps developers and security teams identify privacy-related data flows and generate comprehensive reports.

πŸ“¦ PyPI Project β€’ 🌐 App Dashboard

πŸš€ Features

  • Comprehensive Detection: Identifies 300+ personal data elements (PII, financial data, device identifiers, etc.)
  • Full Catalog Coverage: Loads and scans against all configured data elements from data_elements/ (not a truncated subset)
  • Interactive Menu: Arrow-key navigable menu for selecting output formats
  • Real-time Progress: Visual progress indicator during scanning
  • Multiple Report Formats: Generate reports in TXT, Markdown, or JSON format
  • AI-Powered Enhancement: Optional integration with Ollama or OpenAI for deeper context
  • Backend Integration: Optional upload to backend API for centralized storage
  • Auto-incrementing Reports: Automatically manages report file naming to prevent overwrites

truScanner CLI

TruScanner Terminal Demo

πŸ“¦ Installation

Prerequisites

  • Python 3.9 or higher
  • ollama (optional, for local AI scanning)

Quick Install

Using pip:

pip install truscanner

Using uv:

uv pip install truscanner

Verify installation:

truscanner --help

πŸ› οΈ Usage

Basic Usage

Scan a directory with the interactive menu:

truscanner scan <directory_path>

Example

truscanner scan ./src
truscanner scan ./my-project
truscanner scan C:\Users\username\projects\my-app

Python API Usage

Use truScanner directly from Python:

import truscanner

# Local path
check = truscanner("/path/to/project")

# file:// URL also works
check = truscanner("file:///Users/username/project")

# Optional explicit call style
check = truscanner.scan("/path/to/project", with_ai=False)

# API metadata: total configured catalog size
print(check["configured_data_elements"])

Minimal script style:

import truscanner
scan = truscanner("folder_path")

Runnable root example:

python3 simple_truscanner_usage.py ./src

Quick smoke check script:

uv run python scripts/check_truscanner_api.py ./src

Installed-package verification (CLI + Python API):

python3 verify_truscanner_install.py

Interactive Workflow

  1. Select Output Format:

    • Use arrow keys (↑↓) to navigate
    • Press Enter to select
    • Options: txt, md, json, or All (generates all three formats)
  2. Scanning Progress:

    • Real-time progress bar shows file count and percentage
    • Prints configured definition count at start (example: Loaded data element definitions: 380)
    • Example: Scanning: 50/200 (25%) [β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘] filename.js
  3. AI Enhanced Scan (Optional):

    • After the initial scan, you'll be prompted: Do you want to use Ollama/AI for enhanced PII detection (find what regex missed)? (Y, N):
    • This uses local LLMs (via Ollama) or OpenAI to find complex PII.
    • Live scanning timer: AI Scanning: filename.js... (5.2s taken)
  4. Report Generation:

    • Reports are saved in reports/{directory_name}/ folder
    • Files are named: truscan_report.txt, truscan_report.md, truscan_report.json
    • Subsequent scans auto-increment: truscan_report1.txt, truscan_report2.txt, etc.
    • AI findings are saved with _llm suffix.
  5. Backend Upload (Optional):

    • After reports are saved, you'll be prompted: Do you want to upload the scan report for the above purpose? (Y, N):
    • Enter Y to upload scan results to backend API
    • View your uploaded scans and analytics at app.truconsent.io

Command Options

truscanner scan <directory> [OPTIONS]

Options:
  --with-ai          Enable AI/LLM scanner directly
  --ai-mode          AI scan mode: fast, balanced, or full (default: balanced)
  --personal-only    Only report personal identifiable information (PII)
  --help             Show help message

AI Speed vs Coverage Modes

Use --ai-mode to control AI scan behavior:

  • fast: Small prompts, fastest runtime, may skip very large low-signal files
  • balanced (default): Good speed while keeping broad file coverage
  • full: Largest context and highest coverage, slowest runtime

Examples:

truscanner scan ./src --ai-mode fast
truscanner scan ./src --ai-mode balanced
truscanner scan ./src --ai-mode full

πŸ“Š Report Output

Report Location

Reports are saved in: reports/{sanitized_directory_name}/

Report Formats

  • TXT Report (truscan_report.txt): Plain text format, easy to read
  • Markdown Report (truscan_report.md): Formatted markdown with headers and code blocks
  • JSON Report (truscan_report.json): Structured JSON data for programmatic access

Report Contents

Each report includes:

  • Scan Report ID: Unique 32-bit hash identifier
  • Summary: Configured data elements, distinct detected elements, total findings, and time taken
  • Findings by File: Detailed list of data elements found in each file
  • Summary by Category: Aggregated statistics by data category

JSON reports also include:

  • configured_data_elements
  • distinct_detected_elements

Report ID

Each scan generates a unique Scan Report ID (32-bit MD5 hash) that:

  • Appears in the terminal after scanning
  • Is included at the top of all generated report files
  • Can be used to track and reference specific scans

πŸ”§ Configuration

The truscanner package is pre-configured with the live backend URL for seamless scan uploads. No additional configuration is required.

πŸ“ Project Structure

truscanner/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ main.py              # CLI entry point
β”‚   β”œβ”€β”€ regex_scanner.py     # Core scanning engine
β”‚   β”œβ”€β”€ ai_scanner.py        # AI/LLM scanning engine
β”‚   β”œβ”€β”€ report_utils.py      # Report utilities
β”‚   └── utils.py             # Utilities
β”œβ”€β”€ data_elements/           # Data element definitions
β”œβ”€β”€ reports/                 # Generated reports
β”œβ”€β”€ pyproject.toml           # Project configuration
└── README.md

πŸ“ Change Policy

For this repository, every code or behavior change must include a matching README update in the same change.

This includes:

  • CLI flags, prompts, defaults, scan behavior, output format changes
  • Python API changes (import truscanner, return schema, parameters)
  • Dependency/runtime requirements
  • Report format/location updates

🀝 Support

For issues, questions, or contributions, please contact: hello@truconsent.io

MIT License - see LICENSE file for details

Contributors 4

  •  
  •  
  •  
  •  

Languages