A simple rust program for calculating the Word Error Rate. This is part of my learning process for getting to know Rust.
Also, I wanted to see how much faster Rust can be when compared to interpreter languages such as Python. The file
python-equivalent/wer.py has the exact same algorithm written in Python.
Word Error Rate (WER) is a way to evaluate the performance of Speech-To-Text systems. It takes into account how many words are needed to be inserted/deleted or substituted between the predicted text (the output of the ASR system) and the ground truth (manually transcribed text). In my implementation I am returning the average WER from every separate sentence.
clap = "2.33.3"for the command line parsing.cute = "0.3.0"for easier loops.
- Build the project by running
cargo buildinside the directory (orcargo build --releasein order to avoid recompiling the code when runningcargo run ...). - If you used the
--releaseflag then run the program bycargo run --release $FILE1 $FILE2where$FILE1is the file of the predicted transcriptions and$FILE2is the file containing the true transcriptions. The format of these files is explained below. - You can also run the code by doing
./target/release/rust-wer $FILE1 $FILE2. - In order to time the execution time, I used the
timecommand. For example,time ./target/release/rust-wer $FILE1 $FILE2. - The output WER will be simply printed in the console.
The input files must be simple text files that contain one sentence per line. The line numbers must be the same between
the predicted and the true texts. So, if the predicted text was empty for a certain sentence then you need to leave
an empty line there. See the data directory for examples.
Also, the texts must be already preprocessed, i.e. remove punctuations and convert to lower case.
To time the rust program do the following:
time ./target/release/rust-wer ./data/mytranscripts.txt ./data/truth.txtFor me, this took 0.003 seconds.
To time the python script do the following:
time python main.py ./data/mytranscripts.txt ./data/truth.txtWhile this took 0.149 seconds. So, even with such a small example the execution-time difference is huge.
I also wanted to see how julia performs on the same task and so I created the juliawer project. By running the
juliawer.jl with julia, it takes more than a second which is way worse than the above two executions and I am
not sure why.
I also tried using PackageCompiler in order to compile the julia code but the result did not change.
- Change input file format (json?)
- Maybe handle punctuations/lowercase