chconv is a command-line tool for automatically detecting file character encodings and converting them to a specified target encoding (default is UTF-8).
- Automatic detection of file encoding formats (supports multiple encodings)
- Batch conversion of single files or entire directories
- Recursive processing of subdirectories
- Filtering by file extension
- Support for multiple target encoding formats (via libiconv)
- Dry-run mode to preview operations before execution
- Cross-platform support (Windows/Linux/macOS)
- C++20
- Dependencies:
chconv [options] -i <input file/directory> -o <output file/directory>| Option | Short | Description |
|---|---|---|
| --input | -i | Input file or directory (required) |
| --output | -o | Output file or directory (required) |
| --to | -t | Target encoding format (default: UTF-8) |
| --verbose | -v | Show detailed output |
| --recursive | -r | Recursively process directories |
| --dry-run | -d | Show operations to be performed without actually converting |
| --suffix | -s | Specify file suffix to process (supports regular expressions, multiple patterns separated by ';') |
| --exclude | -e | Exclude files, suffixes or directories from processing using regular expressions (separated by ';') |
-
Convert a single file to UTF-8:
chconv -i input.txt -o output.txt
-
Convert files in an entire directory to GBK encoding:
chconv -r -t GBK -i ./source_dir -o ./target_dir
-
Convert only .log files:
chconv -r -s .log -i ./logs -o ./converted_logs
-
Dry-run mode (preview operations without actual conversion):
chconv -r -d -i ./source -o ./target
-
Convert files while excluding specific files, suffixes or directories using regular expressions:
chconv -r --exclude ".*\.log$|.*\.tmp$|node_modules|\.git" -i ./source_dir -o ./target_dir
Both --exclude and --suffix options support regular expressions for more flexible file and directory matching.
For the --exclude option:
.*\.log$- Matches all files with .log extensionnode_modules- Matches any path containing "node_modules"\.git- Matches any path containing ".git".*\.(tmp|temp)$- Matches files with .tmp or .temp extensions
For the --suffix option:
\.log$- Matches all files with .log extension\.log$|\.txt$- Matches files with .log or .txt extensionslog$- Matches all files whose names end with "log" (including .log files)\.log$;\.tmp$- Matches files with either .log or .tmp extension (using ';' separator)
When using regular expressions with the --suffix option, patterns are matched against both the full extension (including the dot) and the extension without the dot.
Multiple patterns can be combined using the | (OR) operator within a single expression or separated by ; as individual expressions.
chconv supports all encoding formats supported by libiconv, including but not limited to:
- UTF-8, UTF-16, UTF-32
- ASCII
- ISO-8859 series
- Windows series (CP1252, CP936, etc.)
- Chinese encodings (GB2312, GBK, GB18030, Big5)
- Japanese encodings (Shift_JIS, EUC-JP)
- Korean encodings (EUC-KR)
For a complete list, please refer to the libiconv documentation