A PowerShell script designed to tokenize sensitive data in CSV files by replacing sensitive column values with unique tokens, providing a secure way to anonymize data while maintaining the ability to de-tokenize when needed.
- Interactive Column Selection: Navigate through CSV columns with an intuitive menu system
- Batch Processing: Support for command-line arguments for automated workflows
- Reversible Tokenization: Generate token keys for complete data recovery
- Flexible Delimiters: Support for custom CSV delimiters
- Search Functionality: Find columns by name using pattern matching
- Data Integrity: Maintains CSV structure while protecting sensitive information
- Tokenization: Replaces sensitive data in selected columns with unique tokens (format:
T[HEX]) - Key Generation: Creates a JSON key file with Base64-encoded mappings for secure storage
- De-tokenization: Restores original data using the generated key file
Run the script without arguments for guided setup:
./CSVTokenizer.ps1The interactive mode will prompt you to:
- Choose between tokenize or de-tokenize mode
- Specify the input file path
- Set the CSV delimiter (defaults to comma)
- Select columns using the navigation interface
./CSVTokenizer.ps1 -d ',' -c '"Name","Address","Phone Number"'Parameters:
-c: Comma-separated list of column names to tokenize (in CSV format)-d: CSV delimiter character (optional, defaults to comma)
./CSVTokenizer.ps1 -t "path/to/tokenized-file.csv" "path/to/token-keys.json"Parameters:
-t: Indicates de-tokenization mode- First argument: Path to the tokenized CSV file
- Second argument: Path to the token keys JSON file
When using interactive mode, use these commands to navigate and select columns:
| Command | Action |
|---|---|
+ |
Go to next column |
- |
Go to previous column |
# |
Jump to specific column by index |
? |
Search for column by name |
x |
Toggle column selection |
S |
Show selection summary |
. |
Continue with tokenization |
H |
Show help menu |
QUIT |
Exit script |
- Location: Same directory as input file
- Naming:
Tokenized-[original-filename].csv - Content: Original CSV with selected columns replaced by tokens
- Location: Same directory as input file
- Naming:
[original-filename-without-extension]-TokenKeys.JSON - Content: Base64-encoded mapping of original values to tokens
- Location: Same directory as tokenized file
- Naming:
DeTokenized-[tokenized-filename].csv - Content: Fully restored original data
# Interactive mode
./CSVTokenizer.ps1
# Command line mode
./CSVTokenizer.ps1 customers.csv -d ',' -c '"Customer Name","Email","Phone"'Input CSV:
Customer Name,Email,Phone,Order ID
John Doe,john@email.com,555-1234,12345
Jane Smith,jane@email.com,555-5678,12346
Output (Tokenized):
Customer Name,Email,Phone,Order ID
T[0],T[1],T[2],12345
T[3],T[4],T[5],12346
./CSVTokenizer.ps1 -t "Tokenized-customers.csv" "customers-TokenKeys.JSON"This will restore the original sensitive data from the tokenized file.
- Key File Protection: Store token key files securely and separately from tokenized data
- Base64 Encoding: Original values are Base64-encoded in key files for additional obfuscation
- Unique Tokens: Each unique value gets a distinct hexadecimal token
- Reversibility: Only users with both the tokenized file AND key file can restore data
- PowerShell 5.0 or higher
- Windows operating system (uses Microsoft.VisualBasic.FileIO.TextFieldParser)
- Read/write permissions for input and output directories
The script includes robust error handling for:
- Invalid file paths
- Malformed CSV files
- Out-of-range column selections
- Invalid search queries
- Missing token key files
File Not Found Errors: Ensure file paths are correct and files exist Permission Errors: Run PowerShell as administrator if needed Malformed CSV: Verify CSV format and delimiter settings Token Key Mismatch: Ensure you’re using the correct key file for de-tokenization
This project is open source. Feel free to modify and distribute according to your needs.
Contributions are welcome! Please feel free to submit pull requests or open issues for bugs and feature requests.