Document Anonymizer Tool

A local document processing tool that anonymizes personal data in text, DOCX, and PDF files all locally.

Features

Multiple File Format Support: Process TXT, DOCX, and PDF files
Local Processing: All processing happens on your local machine
Detailed Logging: Generates JSON logs of all anonymization actions
Customizable Rules: Enable/disable specific detection rules via configuration
Comprehensive Data Detection:
- Credit card numbers
- Phone numbers (US format)
- Email addresses
- Social Security Numbers
- Salary amounts
- Street addresses
- ZIP codes
- IP addresses
- Person names (context-aware heuristic detection)

Installation

Automated Installation (Windows)

Navigate to the project directory
Double-click on install_anonymizer.bat to run the installer

The script will:

Check for Python 3.11.0 and install it if needed
Upgrade pip, setuptools, and wheel to required versions
Install the anonymizer package and its dependencies

Automated Installation (Linux)

Open a terminal and navigate to the project directory
Run the installation script:

chmod +x install_anonymizer.sh
./install_anonymizer.sh

The script will:

Check for Python 3.11.0 , If you DON'T have Python installed you need to do it manually! (because of system distribution differences)
Upgrade pip, setuptools, and wheel to required versions
Install the anonymizer package and its dependencies

Usage

This will create an anonymized version at path/to/your/file_anonymized.docx:
- anonymizer path/to/your/file.docx

Advanced Options

Specify custom output file:
- anonymizer input.pdf -o output.pdf
Use a custom configuration file:
- anonymizer document.txt -c custom_config.json
List all available anonymization rules:
- anonymizer --list-rules
Create a default configuration file to customize:
- anonymizer --create-config name.json
Enable verbose logging(future-proofing):
- anonymizer file.docx -v

Configuration

You can create a custom configuration file to enable/disable specific anonymization rules:
Create a default config file
- anonymizer --create-config my_config.json

Output

For each processed file, the tool generates:

An anonymized version of the original file (with _anonymized suffix by default)
A detailed JSON log file showing what was anonymized and where

Supported File Types

TXT: Plain text files
DOCX: Microsoft Word documents
PDF: Portable Document Format files

Limitations

This is an MVP (Minimum Viable Product) and may have limitations with complex document layouts
PDF processing preserves text content but may not maintain exact visual formatting
Name detection uses heuristics and may have false positives/negatives
Currently optimized for US format phone numbers and addresses

Requirements

Python 3.11.0
4GB+ RAM recommended for processing large PDF files

Troubleshooting

If you encounter issues:

Ensure you have a stable internet connection for the initial installation
Check that your system meets the Python version requirement (3.11.0)
For PDF processing issues, ensure you have adequate system memory
Use the -v flag for verbose logging to identify problems

Development

The project uses setuptools for packaging. The main entry point is in anonymizer/main.py.

To modify and reinstall:

Make your changes
Run
- install_anonymizer.(bat|sh)
- or
- pip install -e .

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
anonymizer		anonymizer
examples		examples
LICENSE		LICENSE
README.md		README.md
install_anonymizer.bat		install_anonymizer.bat
install_anonymizer.sh		install_anonymizer.sh
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Document Anonymizer Tool

Features

Installation

Automated Installation (Windows)

Automated Installation (Linux)

Usage

Advanced Options

Configuration

Output

Supported File Types

Limitations

Requirements

Troubleshooting

Development

About

Uh oh!

Releases

Packages

Languages

License

todor02/Anonymizer

Folders and files

Latest commit

History

Repository files navigation

Document Anonymizer Tool

Features

Installation

Automated Installation (Windows)

Automated Installation (Linux)

Usage

Advanced Options

Configuration

Output

Supported File Types

Limitations

Requirements

Troubleshooting

Development

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages