Skip to content

A fast and parallel file comparison tool in Rust using the Myers diff algorithm for accurate line-based similarity analysis.

License

Notifications You must be signed in to change notification settings

mukesh1352/PlagCheck

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PlagCheck – Rust File Similarity Checker

PlagCheck is a high-performance, parallelized command-line tool written in Rust for computing file similarity using line-based diff analysis.
It is designed to efficiently compare a reference file against multiple target files, making it suitable for plagiarism detection, code similarity analysis, and large-scale text comparison workflows.

The tool focuses on correctness, performance, and deterministic results.


Features

  • Parallel file comparison using Rayon
  • Accurate similarity scoring based on the Myers diff algorithm
  • Line-level comparison with whitespace normalization
  • Sorted similarity results for easy interpretation
  • Reports total processing time
  • Compares one reference file against all files in a directory

Installation

Prerequisites

  • Rust (version 1.70 or later recommended)

Build from Source

git clone https://github.com/mukesh1352/PlagCheck.git
cd PlagCheck
cargo build --release

Usage

Run the tool in release mode:

cargo run --release

The program compares a reference file against all files in the specified target directory and prints similarity percentages sorted in descending order.


Example Output

Comparing reference.txt against ./submissions

submission1.txt  ->  92.4%
submission2.txt  ->  67.1%
submission3.txt  ->  41.8%

Total processing time: 38 ms

How It Works

  1. Reads the reference file line by line
  2. Scans the target directory for candidate files
  3. Uses parallel processing to compare files concurrently
  4. Normalizes lines by trimming whitespace and ignoring empty lines
  5. Computes differences using the Myers diff algorithm
  6. Calculates similarity percentages
  7. Displays results sorted by similarity score

Dependencies

similar = "2.1"   # Myers diff algorithm implementation
rayon  = "1.7"    # Data-parallel processing
blake  = "2.0.2"  # Fast hashing utilities (internal use)

Project Structure

src/
├── main.rs              # CLI entry point
├── input.rs             # File input and directory scanning
├── content_checker.rs   # Core comparison logic
├── utils.rs             # Helper utilities
tests/                   # Unit tests
Cargo.toml               # Project configuration

Use Cases

  • Plagiarism detection for text or code files
  • Code similarity analysis
  • Batch comparison of large text datasets
  • Performance-sensitive diff computation

License

MIT License. See the LICENSE file for details.

Built with Rust

About

A fast and parallel file comparison tool in Rust using the Myers diff algorithm for accurate line-based similarity analysis.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages