Skip to content

Xaoc-Industries/CSVTokenizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

CSV Tokenizer

A PowerShell script designed to tokenize sensitive data in CSV files by replacing sensitive column values with unique tokens, providing a secure way to anonymize data while maintaining the ability to de-tokenize when needed.

Features

  • Interactive Column Selection: Navigate through CSV columns with an intuitive menu system
  • Batch Processing: Support for command-line arguments for automated workflows
  • Reversible Tokenization: Generate token keys for complete data recovery
  • Flexible Delimiters: Support for custom CSV delimiters
  • Search Functionality: Find columns by name using pattern matching
  • Data Integrity: Maintains CSV structure while protecting sensitive information

How It Works

  1. Tokenization: Replaces sensitive data in selected columns with unique tokens (format: T[HEX])
  2. Key Generation: Creates a JSON key file with Base64-encoded mappings for secure storage
  3. De-tokenization: Restores original data using the generated key file

Usage

Interactive Mode

Run the script without arguments for guided setup:

./CSVTokenizer.ps1

The interactive mode will prompt you to:

  1. Choose between tokenize or de-tokenize mode
  2. Specify the input file path
  3. Set the CSV delimiter (defaults to comma)
  4. Select columns using the navigation interface

Command Line Arguments

Tokenization Mode

./CSVTokenizer.ps1 -d ',' -c '"Name","Address","Phone Number"'

Parameters:

  • -c: Comma-separated list of column names to tokenize (in CSV format)
  • -d: CSV delimiter character (optional, defaults to comma)

De-tokenization Mode

./CSVTokenizer.ps1 -t "path/to/tokenized-file.csv" "path/to/token-keys.json"

Parameters:

  • -t: Indicates de-tokenization mode
  • First argument: Path to the tokenized CSV file
  • Second argument: Path to the token keys JSON file

Interactive Navigation Controls

When using interactive mode, use these commands to navigate and select columns:

Command Action
+ Go to next column
- Go to previous column
# Jump to specific column by index
? Search for column by name
x Toggle column selection
S Show selection summary
. Continue with tokenization
H Show help menu
QUIT Exit script

Output Files

Tokenized CSV

  • Location: Same directory as input file
  • Naming: Tokenized-[original-filename].csv
  • Content: Original CSV with selected columns replaced by tokens

Token Keys File

  • Location: Same directory as input file
  • Naming: [original-filename-without-extension]-TokenKeys.JSON
  • Content: Base64-encoded mapping of original values to tokens

De-tokenized CSV

  • Location: Same directory as tokenized file
  • Naming: DeTokenized-[tokenized-filename].csv
  • Content: Fully restored original data

Examples

Example 1: Tokenizing Customer Data

# Interactive mode
./CSVTokenizer.ps1

# Command line mode
./CSVTokenizer.ps1 customers.csv -d ',' -c '"Customer Name","Email","Phone"'

Input CSV:

Customer Name,Email,Phone,Order ID
John Doe,john@email.com,555-1234,12345
Jane Smith,jane@email.com,555-5678,12346

Output (Tokenized):

Customer Name,Email,Phone,Order ID
T[0],T[1],T[2],12345
T[3],T[4],T[5],12346

Example 2: De-tokenizing Data

./CSVTokenizer.ps1 -t "Tokenized-customers.csv" "customers-TokenKeys.JSON"

This will restore the original sensitive data from the tokenized file.

Security Considerations

  • Key File Protection: Store token key files securely and separately from tokenized data
  • Base64 Encoding: Original values are Base64-encoded in key files for additional obfuscation
  • Unique Tokens: Each unique value gets a distinct hexadecimal token
  • Reversibility: Only users with both the tokenized file AND key file can restore data

Requirements

  • PowerShell 5.0 or higher
  • Windows operating system (uses Microsoft.VisualBasic.FileIO.TextFieldParser)
  • Read/write permissions for input and output directories

Error Handling

The script includes robust error handling for:

  • Invalid file paths
  • Malformed CSV files
  • Out-of-range column selections
  • Invalid search queries
  • Missing token key files

Troubleshooting

File Not Found Errors: Ensure file paths are correct and files exist Permission Errors: Run PowerShell as administrator if needed Malformed CSV: Verify CSV format and delimiter settings Token Key Mismatch: Ensure you’re using the correct key file for de-tokenization

License

This project is open source. Feel free to modify and distribute according to your needs.

Contributing

Contributions are welcome! Please feel free to submit pull requests or open issues for bugs and feature requests.

About

Tool to tokenize columns in a CSV in a reversable manner.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published