PlagCheck is a high-performance, parallelized command-line tool written in Rust for
computing file similarity using line-based diff analysis.
It is designed to efficiently compare a reference file against multiple target files,
making it suitable for plagiarism detection, code similarity analysis, and large-scale
text comparison workflows.
The tool focuses on correctness, performance, and deterministic results.
- Parallel file comparison using Rayon
- Accurate similarity scoring based on the Myers diff algorithm
- Line-level comparison with whitespace normalization
- Sorted similarity results for easy interpretation
- Reports total processing time
- Compares one reference file against all files in a directory
- Rust (version 1.70 or later recommended)
git clone https://github.com/mukesh1352/PlagCheck.git
cd PlagCheck
cargo build --releaseRun the tool in release mode:
cargo run --releaseThe program compares a reference file against all files in the specified target directory and prints similarity percentages sorted in descending order.
Comparing reference.txt against ./submissions
submission1.txt -> 92.4%
submission2.txt -> 67.1%
submission3.txt -> 41.8%
Total processing time: 38 ms
- Reads the reference file line by line
- Scans the target directory for candidate files
- Uses parallel processing to compare files concurrently
- Normalizes lines by trimming whitespace and ignoring empty lines
- Computes differences using the Myers diff algorithm
- Calculates similarity percentages
- Displays results sorted by similarity score
similar = "2.1" # Myers diff algorithm implementation
rayon = "1.7" # Data-parallel processing
blake = "2.0.2" # Fast hashing utilities (internal use)src/
├── main.rs # CLI entry point
├── input.rs # File input and directory scanning
├── content_checker.rs # Core comparison logic
├── utils.rs # Helper utilities
tests/ # Unit tests
Cargo.toml # Project configuration
- Plagiarism detection for text or code files
- Code similarity analysis
- Batch comparison of large text datasets
- Performance-sensitive diff computation
MIT License. See the LICENSE file for details.