Skip to content

Meha555/chconv

Repository files navigation

chconv - Character Encoding Conversion Tool

License: MIT

中文

chconv is a command-line tool for automatically detecting file character encodings and converting them to a specified target encoding (default is UTF-8).

Features

  • Automatic detection of file encoding formats (supports multiple encodings)
  • Batch conversion of single files or entire directories
  • Recursive processing of subdirectories
  • Filtering by file extension
  • Support for multiple target encoding formats (via libiconv)
  • Dry-run mode to preview operations before execution
  • Cross-platform support (Windows/Linux/macOS)

Build Requirements

  • C++20
  • Dependencies:
    • uchardet - for character encoding detection
    • libiconv - for character encoding conversion

Usage

Basic Syntax

chconv [options] -i <input file/directory> -o <output file/directory>

Common Options

Option Short Description
--input -i Input file or directory (required)
--output -o Output file or directory (required)
--to -t Target encoding format (default: UTF-8)
--verbose -v Show detailed output
--recursive -r Recursively process directories
--dry-run -d Show operations to be performed without actually converting
--suffix -s Specify file suffix to process (supports regular expressions, multiple patterns separated by ';')
--exclude -e Exclude files, suffixes or directories from processing using regular expressions (separated by ';')

Examples

  1. Convert a single file to UTF-8:

    chconv -i input.txt -o output.txt
  2. Convert files in an entire directory to GBK encoding:

    chconv -r -t GBK -i ./source_dir -o ./target_dir
  3. Convert only .log files:

    chconv -r -s .log -i ./logs -o ./converted_logs
  4. Dry-run mode (preview operations without actual conversion):

    chconv -r -d -i ./source -o ./target
  5. Convert files while excluding specific files, suffixes or directories using regular expressions:

    chconv -r --exclude ".*\.log$|.*\.tmp$|node_modules|\.git" -i ./source_dir -o ./target_dir

Regular Expression Usage

Both --exclude and --suffix options support regular expressions for more flexible file and directory matching.

For the --exclude option:

  • .*\.log$ - Matches all files with .log extension
  • node_modules - Matches any path containing "node_modules"
  • \.git - Matches any path containing ".git"
  • .*\.(tmp|temp)$ - Matches files with .tmp or .temp extensions

For the --suffix option:

  • \.log$ - Matches all files with .log extension
  • \.log$|\.txt$ - Matches files with .log or .txt extensions
  • log$ - Matches all files whose names end with "log" (including .log files)
  • \.log$;\.tmp$ - Matches files with either .log or .tmp extension (using ';' separator)

When using regular expressions with the --suffix option, patterns are matched against both the full extension (including the dot) and the extension without the dot.

Multiple patterns can be combined using the | (OR) operator within a single expression or separated by ; as individual expressions.

Supported Encoding Formats

chconv supports all encoding formats supported by libiconv, including but not limited to:

  • UTF-8, UTF-16, UTF-32
  • ASCII
  • ISO-8859 series
  • Windows series (CP1252, CP936, etc.)
  • Chinese encodings (GB2312, GBK, GB18030, Big5)
  • Japanese encodings (Shift_JIS, EUC-JP)
  • Korean encodings (EUC-KR)

For a complete list, please refer to the libiconv documentation

About

Command-line tool for file encoding conversion.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published