Rust-WER

A simple rust program for calculating the Word Error Rate. This is part of my learning process for getting to know Rust. Also, I wanted to see how much faster Rust can be when compared to interpreter languages such as Python. The file python-equivalent/wer.py has the exact same algorithm written in Python.

Word Error Rate (WER) is a way to evaluate the performance of Speech-To-Text systems. It takes into account how many words are needed to be inserted/deleted or substituted between the predicted text (the output of the ASR system) and the ground truth (manually transcribed text). In my implementation I am returning the average WER from every separate sentence.

Dependencies

clap = "2.33.3" for the command line parsing.
cute = "0.3.0" for easier loops.

Usage

Build the project by running cargo build inside the directory (or cargo build --release in order to avoid recompiling the code when running cargo run ...).
If you used the --release flag then run the program by cargo run --release $FILE1 $FILE2 where $FILE1 is the file of the predicted transcriptions and $FILE2 is the file containing the true transcriptions. The format of these files is explained below.
You can also run the code by doing ./target/release/rust-wer $FILE1 $FILE2.
In order to time the execution time, I used the time command. For example, time ./target/release/rust-wer $FILE1 $FILE2.
The output WER will be simply printed in the console.

File Format

The input files must be simple text files that contain one sentence per line. The line numbers must be the same between the predicted and the true texts. So, if the predicted text was empty for a certain sentence then you need to leave an empty line there. See the data directory for examples.

Also, the texts must be already preprocessed, i.e. remove punctuations and convert to lower case.

Timing the test data

To time the rust program do the following:

time ./target/release/rust-wer ./data/mytranscripts.txt ./data/truth.txt

For me, this took 0.003 seconds.

To time the python script do the following:

time python main.py ./data/mytranscripts.txt ./data/truth.txt

While this took 0.149 seconds. So, even with such a small example the execution-time difference is huge.

Using Julia

I also wanted to see how julia performs on the same task and so I created the juliawer project. By running the juliawer.jl with julia, it takes more than a second which is way worse than the above two executions and I am not sure why.

I also tried using PackageCompiler in order to compile the julia code but the result did not change.

TODO:

Change input file format (json?)
Maybe handle punctuations/lowercase

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data		data
juliawer		juliawer
python-equivalent		python-equivalent
src		src
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Rust-WER

Dependencies

Usage

File Format

Timing the test data

Using Julia

TODO:

About

Uh oh!

Releases

Packages

Languages

License

geoph9/rust-wer

Folders and files

Latest commit

History

Repository files navigation

Rust-WER

Dependencies

Usage

File Format

Timing the test data

Using Julia

TODO:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages