Custom implementation for the Unix wc (word count) utility, written in Python.
Project
- Description: Custom Python reimplementation of the
wcutility. Counts lines, words, bytes and multibyte characters from files or standard input. - Author: Alexandru Prodan
Features
- Lines:
-l— count newline characters. - Words:
-w— count words (split by whitespace after decoding). - Bytes:
-c— count raw bytes. - Characters:
-m— count characters after decoding according to locale. - Default behavior: when no flags are provided the program prints lines, words and bytes (in that order).
Requirements
- Python:
^3.10(seepyproject.toml).
Installation
-
Using Poetry:
poetry install -
Or with pip (editable install for development):
pip install -e .
Usage
-
Run against a file:
python pywc/main.py [options] [path] -
Read from standard input (no path):
cat data/test.txt | python pywc/main.py -w -
Options:
-l: number of lines-w: number of words-c: number of bytes-m: number of characters (multibyte-aware, uses locale decoding)
-
Output format:
- When a file
pathis provided the program prints counts separated by tabs and appends the file path as the last column. - When reading from
stdin(nopath) the path column is omitted.
Example (default behavior, no flags):
python pywc/main.py data/test.txtOutput format:
LINES\tWORDS\tBYTES\t/path/to/data/test.txtExample (specific flags):
python pywc/main.py -l -w data/test.txtOutput format:
LINES\tWORDS\t/path/to/data/test.txt - When a file
Running Tests
-
Run the test suite with
pytest:pytest -q
Repository layout
pywc/: package source (main implementation ispywc/main.py).tests/: unit tests (usespytest).data/: example/test data files.
Contributing
- Contributions are welcome. Please open issues or pull requests. Follow standard GitHub workflow and keep changes focused and small.
License
- See the
LICENSEfile in the repository root.