A local document processing tool that anonymizes personal data in text, DOCX, and PDF files all locally.
- Multiple File Format Support: Process TXT, DOCX, and PDF files
- Local Processing: All processing happens on your local machine
- Detailed Logging: Generates JSON logs of all anonymization actions
- Customizable Rules: Enable/disable specific detection rules via configuration
- Comprehensive Data Detection:
- Credit card numbers
- Phone numbers (US format)
- Email addresses
- Social Security Numbers
- Salary amounts
- Street addresses
- ZIP codes
- IP addresses
- Person names (context-aware heuristic detection)
- Navigate to the project directory
- Double-click on
install_anonymizer.batto run the installer
The script will:
- Check for Python 3.11.0 and install it if needed
- Upgrade
pip,setuptools, andwheelto required versions - Install the anonymizer package and its dependencies
- Open a terminal and navigate to the project directory
- Run the installation script:
chmod +x install_anonymizer.sh
./install_anonymizer.shThe script will:
- Check for Python 3.11.0 , If you DON'T have Python installed you need to do it manually! (because of system distribution differences)
- Upgrade
pip,setuptools, andwheelto required versions - Install the anonymizer package and its dependencies
- This will create an anonymized version at path/to/your/file_anonymized.docx:
anonymizer path/to/your/file.docx
-
Specify custom output file:
anonymizer input.pdf -o output.pdf
-
Use a custom configuration file:
anonymizer document.txt -c custom_config.json
-
List all available anonymization rules:
anonymizer --list-rules
-
Create a default configuration file to customize:
anonymizer --create-config name.json
-
Enable verbose logging(future-proofing):
anonymizer file.docx -v
-
You can create a custom configuration file to enable/disable specific anonymization rules:
-
Create a default config file
anonymizer --create-config my_config.json
For each processed file, the tool generates:
- An anonymized version of the original file (with
_anonymizedsuffix by default) - A detailed JSON log file showing what was anonymized and where
- TXT: Plain text files
- DOCX: Microsoft Word documents
- PDF: Portable Document Format files
- This is an MVP (Minimum Viable Product) and may have limitations with complex document layouts
- PDF processing preserves text content but may not maintain exact visual formatting
- Name detection uses heuristics and may have false positives/negatives
- Currently optimized for US format phone numbers and addresses
- Python 3.11.0
- 4GB+ RAM recommended for processing large PDF files
If you encounter issues:
- Ensure you have a stable internet connection for the initial installation
- Check that your system meets the Python version requirement (3.11.0)
- For PDF processing issues, ensure you have adequate system memory
- Use the
-vflag for verbose logging to identify problems
The project uses setuptools for packaging. The main entry point is in anonymizer/main.py.
To modify and reinstall:
- Make your changes
- Run
install_anonymizer.(bat|sh)- or
pip install -e .