diff --git a/.gitignore b/.gitignore index f012143a..ed91e6de 100644 --- a/.gitignore +++ b/.gitignore @@ -34,3 +34,15 @@ Cargo.lock levenshtein.mod */racket/compiled/ *.zo +*/clojure-native-image/run +*/java-native-image/jvm.run +*/java-native-image/run +loops/*/code +loops/*/run +fibonacci/*/code +fibonacci/*/run +levenshtein/*/code +levenshtein/*/run +hello-world/*/code +hello-world/*/run +.nrepl-port diff --git a/README.md b/README.md index 13f7795d..8f069d8f 100644 --- a/README.md +++ b/README.md @@ -1,76 +1,207 @@ # Languages -A repo for collaboratively building small benchmarks to compare languages. -If you have a suggestion for improvement: PR! -If you want to add a language: PR! +Having fun together, learning about programming languages, compilers, interpreters, and toolchains, by way of microbenchmarks. -## Running +> [!NOTE] +> We are in the process of replacing our previous benchmark runner with one that relies on in-process measurements, *removing the influence of start/setup times from the results*. Please help in this transitioning by adding the necessary tooling to languages that lack it. See below, under [The Runner](#the-runner). -To run one of the benchmarks: +We're learning together here. -1. `cd` into desired benchmark directory (EG `$ cd loops`) -2. Compile by running `$ ../compile.sh` -3. Run via `$ ../run.sh`. - You should see output something like: - - ``` - $ ../run.sh +* If you have a suggestion for improvement, want to add a language, a benchmark, fix a bug or typo: Issues and PRs. Which one to use depends. But generally is is first Issue, then PR, where you can use the issue to formulate the problem statement, and PRs to address the problem. Use your judgement and we will succeed. +* Have a question? -> Issue. + +## Running the benchmarks + +To run benchmarks you need toolchains to run (and often to compile) the programs for the languages you want to benchmark. The scripts are written so that benchmarks are compiled and run for any language for which you have a working toolchain. + +The steps are performed in a per-benchmark fashion by doing `cd` to the benchmark directory and then: + +1. Compile the programs that need compiling: + + ``` + $ ./compile.sh + ``` +1. Run, providing your GitHub user handle, e.g.: + + ``` + $ ./run.sh -u PEZ + ``` + + (This is what we refer to as [The Runner](#the-runner)) +1. Clean build files: + + ``` + $ ./clean.sh + ``` + +## The Runner + +The general strategy for the runner to benchmark only the targeted function is that it is the programs being benchmarked that do the benchmarking in-process. They only measure around the single piece of work that the benchmark is about. So for **fibonacci** only the call to the function calculating `fibonacci(n)` should be measured. For **levenshtein** benchmark a function that collects all pairing distances is measured. This is because we use the sum of the distances for [correctness check](#correctness-check). + +Each program (language) will be allowed the same amount of time to complete the benchmark work (as many times as it can). + +Because of the above, each language will have to have some minimal utility/tooling for running the function-under-benchmark as many times as a timeout allows, plus reporting the measurements and the result. Here are three implementations, that we can regard as being reference: + +* [benchmark.clj](lib/clojure/src/languages/benchmark.clj) +* [benchmark.java](lib/java/languages/Benchmark.java) +* [benchmark.c](lib/c/benchmark.c) (This one may need some scrutiny from C experts before we fully label it as *reference*.) + +You'll see that the `benchmark/run` function takes two arguments: + +1. `f`: A function (a thunk) +1. `run-ms`: A total time in milliseconds within which the function should be run as many times as possible + +To make the overhead of running and measuring as small as possible, the runner takes a delta time for each time it calls `f`. It is when the sum of these deltas, `total-elapsed-time`, is over the `run-ms` time that we stop calling `f`. So, for a `run-ms` of `1000` the total runtime will always be longer than a second. Because we will almost always “overshoot” with the last run, and because the overhead of running and keeping tally, even if tiny, will always be _something_. + +The benchmark/run function is responsible to report back the result/answer to the task being benchmarked, as well as some stats, like mean run time, standard deviation, min and max times, and how many runs where completed. + +### Running a benchmark + +The new run script is named [run.sh](run.sh). Let's say we run it in the **levenstein** directory: + +```sh +../run.sh -u PEZ +``` - Benchmarking Zig - Benchmark 1: ./zig/code 40 - Time (mean ± σ): 513.9 ms ± 2.9 ms [User: 504.5 ms, System: 2.6 ms] - Range (min … max): 510.6 ms … 516.2 ms 3 runs +The default run time is `10000` ms. `-u` sets the user name (preferably your GitHub handle). The output was this: +```csv +benchmark,timestamp,commit_sha,is_checked,user,model,ram,os,arch,language,run_ms,mean_ms,std-dev-ms,min_ms,max_ms,runs +levenshtein,2025-01-18T23:32:41Z,8e63938,true,PEZ,Apple M4 Max,64GB,darwin24,arm64,Babashka,10000,23376.012916,0.0,23376.012916,23376.012916,1 +levenshtein,2025-01-18T23:32:41Z,8e63938,true,PEZ,Apple M4 Max,64GB,darwin24,arm64,C,10000,31.874277,0.448673,31.286000,35.599000,314 +levenshtein,2025-01-18T23:32:41Z,8e63938,true,PEZ,Apple M4 Max,64GB,darwin24,arm64,Clojure,10000,57.27048066857143,2.210445845051782,55.554958,75.566792,175 +levenshtein,2025-01-18T23:32:41Z,8e63938,true,PEZ,Apple M4 Max,64GB,darwin24,arm64,Clojure Native,10000,59.95592388622754,0.8493245545620596,58.963833,62.897834,167 +levenshtein,2025-01-18T23:32:41Z,8e63938,true,PEZ,Apple M4 Max,64GB,darwin24,arm64,Java,10000,55.194704,1.624322,52.463125,63.390833,182 +levenshtein,2025-01-18T23:32:41Z,8e63938,true,PEZ,Apple M4 Max,64GB,darwin24,arm64,Java Native,10000,60.704966,6.579482,51.807750,96.343541,165 +``` - Benchmarking C - Benchmark 1: ./c/code 40 - Time (mean ± σ): 514.0 ms ± 1.1 ms [User: 505.6 ms, System: 2.8 ms] - Range (min … max): 513.2 ms … 515.2 ms 3 runs +It's a CSV file you can open in something Excel-ish, or consume with your favorite programming language. +![Example Result CSV in Numbers.app](docs/example-results-csv.png) - Benchmarking Rust - Benchmark 1: ./rust/target/release/code 40 - Time (mean ± σ): 514.1 ms ± 2.0 ms [User: 504.6 ms, System: 3.1 ms] - Range (min … max): 512.4 ms … 516.3 ms 3 runs +As you can see, it has some meta data about the run, in addition to the benchmark results. **Clojure** ran the benchmark 175 times, with a mean time of **57.3 ms**. Which shows the point with the new runner, considering that Clojure takes **300 ms** (on the same machine) to start. - ... - ``` +See [run.sh](run.sh) for some more command line options it accepts. Let's note one of them: `-l` which takes a string of comma separated language names, and only those languages will be run. Good for when contributing a new language or updates to a language. E.g: -4. For good measure, execute `$ ../clean.sh` when finished. +``` +~/Projects/languages/levenshtein ❯ ../run.sh -u PEZ -l Clojure +Running levenshtein benchmark... +Results will be written to: /tmp/languages-benchmark/levenshtein_PEZ_10000_5bb1995_only_langs.csv -Hyperfine is used to warm, execute, and time the runs of the programs. +Checking levenshtein Clojure +Check passed +Benchmarking levenshtein Clojure +java -cp clojure/classes:src:/Users/pez/.m2/repository/org/clojure/clojure/1.12.0/clojure-1.12.0.jar:/Users/pez/.m2/repository/org/clojure/core.specs.alpha/0.4.74/core.specs.alpha-0.4.74.jar:/Users/pez/.m2/repository/org/clojure/spec.alpha/0.5.238/spec.alpha-0.5.238.jar run 10000 levenshtein-words.txt +levenshtein,5bb1995,true,PEZ,Apple M4 Max,darwin24,arm64,Clojure,10000,56.84122918181818,0.8759056030546785,55.214541,59.573,176 -## Adding +Done running levenshtein benchmark +Results were written to: /tmp/languages-benchmark/levenshtein_PEZ_10000_5bb1995_only_langs.csv +``` -To add a language: +### Compiling a benchmark -1. Select the benchmark directory you want to add to (EG `$ cd loops`) -2. Create a new subdirectory for the language (EG `$ mkdir rust`) -3. Implement the code in the appropriately named file (EG: `code.rs`) -4. If the language is compiled, add appropriate command to `../compile.sh` and `../clean.sh` -5. Add appropriate line to `../run.sh` +This works as before, but since the new programs are named `run` instead of `code`, we need a new script. Meet: [compile.sh](compile.sh) -You are also welcome to add new top-level benchmarks dirs +```sh +../compile.sh +``` -# Available Benchmarks +### Adding a language -### [hello-world](./hello-world/README.md) +To add (or port) a language for a benchmark to the new runner you'll need to add: -### [loops](./loops/README.md) +1. A benchmarking utility in `lib/` +1. Code in `//run.` (plus whatever extra project files) + - If you are porting from the legacy runner, copy the corresponding `code.` and start from there. See about [benchmark changes](#changes-to-the-benchmarks-compared-to-legacy-runner) below. +1. An entry in `compile.sh` (copy from `compile-legacy.sh` if you are porting) +1. An entry in `run.sh` (copy from `compile-legacy.sh` if you are porting) +1. Maybe some code in `clean.sh` (All temporary/build files should be cleaned.) +1. Maybe some entries in `.gitignore` (All build files, and temporary toolchain files should be added here.) -### [fibonacci](./fibonacci/README.md) +The `main` function of the program provided should take three arguments: -### [levenshtein](./levenshtein/README.md) +1. The run time in milliseconds +1. The warmup time in milliseconds +1. The input to the function + - There is only one input argument, unlike before. How this input argument should be interpreted depends on the benchmark. For **levenshtein** it is a file path, to the file containing the words to use for the test. -# Corresponding visuals +As noted before the program should run the function-under-benchmark as many times as it can, following the example of the reference implementations mentioned above. The program is allowed to run warmup runs before the actual benchmark run. E.g. so that a JIT compiler will have had some chance to optimize. It should then pass the warmup time to its benchmark runner. + +The program should output a csv row with: + +```csv +mean_ms,std-dev-ms,min_ms,max_ms,times,result +``` + +Before a PR with a new or ported language contribution will be merged, you should provide output (text) from a benchmark run. To facilitate this both `compile.sh` and `run.sh` takes a `-l ` argument, where `` is a comma-seprated list of language names. E.g.: + +```sh +$ ../compile.sh -l 'C,Clojure' +$ ../run.sh -u PEZ -l 'C,Clojure' +``` + +Please provide output from all benchmark contributions you have added/touched. + +### Changes to the benchmarks compared to legacy runner + +When adapting a language implementation of some benchmark, consider these differences + +* **fibonacci**: + * The program should return the result of `fib(n)`. This is to keep the benchmark focused on one thing. + * Early exit for `n < 2` are now allowed, again to keep the benchmark focused. + * The input is now `37`, to allow slower languages to complete more runs. +* **loops**: The inner loop is now 10k, again to allow slower languages to complete more runs. +* **levenshtein**: + 1. Smaller input (slower languages...) + 1. The input is provided via a file (pointed at by the input argument) + 1. We only calculate each word pairing distance once (A is as far from B as B is from A) + 1. There is a single result, the sum of the distances. +* **hello-world**: No changes. + * It needs to accept and ignore the two arguments (There is no benchmarking code in there, because it will be benchmarked out-of-process, using **hyperfine**) + +Let's look at the `-main` function for the Clojure **levenshtein** contribution: + +```clojure +(defn -main [& args] + (let [run-ms (parse-long (first args)) + warmup-ms (parse-long (second args)) + input-path (nth args 2) + strings (-> (slurp input-path) + (string/split-lines)) + _warmup (benchmark/run #(levenshtein-distances strings) warmup-ms) + results (benchmark/run #(levenshtein-distances strings) run-ms)] + (-> results + (update :result (partial reduce +)) + benchmark/format-results + println))) +``` + +The `benchmark/run` function returns a map with the measurements and the result keyed on `:result`. *This result is a sequence of all the distances.* Outside the benchmarked function we sum the distances, and then format the output with this sum. It's done this way to minimize the impact that the benchmarking needs has on the benchmarked work. (See [levenshtein/jvm/run.java](levenshtein/jvm/run.java) or [levenshtein/c/run.c](levenshtein/c/run.c) if the Lisp is tricky to read for you.) + +## Available Benchmarks + +#### [hello-world](./hello-world/README.md) + +#### [loops](./loops/README.md) + +#### [fibonacci](./fibonacci/README.md) + +#### [levenshtein](./levenshtein/README.md) + +## Corresponding visuals + +Here's a visualization of a run using the languages ported to the in-process runner as of January 23, 2024 + +- https://pez.github.io/languages-visualizations/#https://gist.github.com/PEZ/411e2da1af3bbe21c4ad1d626451ec1d +- The https://pez.github.io/languages-visualizations/ page will soon be defaulting to the in-process runs + +### Legacy visuals Several visuals have been published based on the work here. -More will likely be added in the future, as this repository improves: - https://benjdd.com/languages - https://benjdd.com/languages2 - https://benjdd.com/languages3 -- https://pez.github.io/languages-visualizations/ - - check https://github.com/PEZ/languages-visualizations/tags for tags, which correspond to a snapshot of some particular benchmark run: e.g: - - https://pez.github.io/languages-visualizations/v2024.12.31/ \ No newline at end of file + +- https://pez.github.io/languages-visualizations/v2025.01.21/ diff --git a/clean.sh b/clean.sh index f38feab3..f1fae31c 100755 --- a/clean.sh +++ b/clean.sh @@ -1,11 +1,9 @@ rm c3/code -rm c/code +rm c/{code,run} rm cpp/code rm go/code -rm jvm/*.class -rm java-native-image/code -rm java-native-image/jvm.code -rm java-native-image/default.iprof +rm -rf jvm/{*.class,*.iprof} +rm -rf java-native-image/{jvm.run,run,code,jvm.code,*.iprof} rm scala/code scala/code-native rm -r rust/target rm -rf kotlin/code.jar @@ -40,7 +38,7 @@ rm hare/code rm v/code rm emojicode/code emojicode/code.o rm -f chez/code.so -rm -rf clojure/classes clojure/.cpcache -rm -rf clojure-native-image/classes clojure-native-image/.cpcache clojure-native-image/code +rm -rf clojure/{classes,.cpcache,*.class} +rm -rf clojure-native-image/{classes,code,run,*.iprof} rm cobol/main rm emacs-lisp/code.eln emacs-lisp/code.elc diff --git a/compile-legacy.sh b/compile-legacy.sh new file mode 100755 index 00000000..ce5c0ab8 --- /dev/null +++ b/compile-legacy.sh @@ -0,0 +1,66 @@ +function compile { + if [ -d ${1} ]; then + echo "" + echo "Compiling $1" + ${2} 2>/dev/null + result=$? + if [ $result -ne 0 ]; then + echo "Failed to compile ${1} with command: ${2}" + fi + fi +} + +compile 'c3' 'c3c compile c3/code.c3 -o c3/code' +compile 'c' 'gcc -O3 c/code.c -o c/code' +compile 'cpp' 'g++ -std=c++23 -march=native -O3 -Ofast -o cpp/code cpp/code.cpp' +#compile 'go' 'go build -ldflags "-s -w" -o go/code go/code.go' +go build -ldflags "-s -w" -o go/code go/code.go +hare build -R -o hare/code hare/code.ha +compile 'jvm' 'javac jvm/code.java' +compile 'js' 'bun build --bytecode --compile js/code.js --outfile js/bun' +# The compile function can't cope with the java-native-image compile +(cd java-native-image && native-image -cp .. -O3 --pgo-instrument -march=native jvm.code && ./jvm.code $(cat input.txt) && native-image -cp .. -O3 --pgo -march=native jvm.code -o code) +compile 'rust' 'cargo build --manifest-path rust/Cargo.toml --release' +compile 'kotlin' 'kotlinc -include-runtime kotlin/code.kt -d kotlin/code.jar' +compile 'kotlin' 'kotlinc-native kotlin/code.kt -o kotlin/code -opt' +compile 'dart' 'dart compile exe dart/code.dart -o dart/code --target-os=macos' +compile 'inko' '(cd inko && inko build --opt=aggressive code.inko -o code)' +compile 'nim' 'nim c -d:danger --opt:speed -d:passC -x:off -a:off nim/code.nim' +compile 'nim' 'nim -d:release --threads:off --stackTrace:off --lineTrace:off --opt:speed -x:off -o:nim/code c nim/code.nim' +compile 'sbcl' 'sbcl --noinform --non-interactive --load "common-lisp/code.lisp" --build' +compile 'fpc' 'fpc -O3 fpc/code.pas' +compile 'modula2' 'gm2 -O3 modula2/code.mod -o modula2/code' +compile 'crystal' 'crystal build -o crystal/code --release crystal/code.cr' +compile 'scala' 'scala-cli --power package --assembly scala/code.scala -f -o scala/code' +compile 'scala' 'scala-cli --power package --native scala/code.scala -f -o scala/code-native --native-mode release-full' +compile 'scala' 'scala-cli --power package --js scala/codeJS.scala -f -o scala/code.js --js-module-kind commonjs --js-mode fullLinkJS' +compile 'scala' 'bun build --bytecode --compile scala/code.js --outfile scala/bun' +compile 'ldc2' 'ldc2 -O3 -release -boundscheck=off -mcpu=native flto=thin d/code.d' +compile 'odin' 'odin build odin/code.odin -o:speed -file -out:odin/code' +compile 'objc' 'clang -O3 -framework Foundation objc/code.m -o objc/code' +compile 'fortran' 'gfortran -O3 fortran/code.f90 -o fortran/code' +compile 'zig' 'zig build-exe -O ReleaseFast -femit-bin=zig/code zig/code.zig' +compile 'lua' 'luajit -b lua/code.lua lua/code' +compile 'swift' 'swiftc -O -parse-as-library -Xcc -funroll-loops -Xcc -march=native -Xcc -ftree-vectorize -Xcc -ffast-math swift/code.swift -o swift/code' +compile 'csharp' 'dotnet publish csharp -o csharp/code' +compile 'csharp' 'dotnet publish csharp -o csharp/code-aot /p:PublishAot=true /p:OptimizationPreference=Speed' +compile 'fsharp' 'dotnet publish fsharp -o fsharp/code' +compile 'fsharp' 'dotnet publish fsharp -o fsharp/code-aot /p:PublishAot=true /p:OptimizationPreference=Speed' +compile 'haskell' 'ghc -O2 -fllvm haskell/code.hs -o haskell/code || { echo "ghc: cannot compile with llvm backend; fallback to use default backend"; ghc -O2 haskell/code.hs -o haskell/code; }' +compile 'v' 'v -prod -cc clang -cflags -march=native -d no_backtrace -o v/code v/code.v' +compile 'emojicode' 'emojicodec emojicode/code.emojic' +compile 'chez' "echo '(compile-program \"chez/code.ss\")' | chez --optimize-level 3 -q" +#compile 'clojure' "(cd clojure && mkdir -p classes && clojure -Sdeps '{:paths [\".\"]}' -M -e \"(compile 'code)\")" +(cd clojure && mkdir -p classes && clojure -Sdeps '{:paths ["."]}' -M -e "(compile 'code)") +#compile 'clojure-native-image' "(cd clojure-native-image && clojure -M:native-image)" +#Using `compile` for clojure-native-image silently fails +(cd clojure-native-image && clojure -M:native-image --pgo-instrument -march=native && ./code $(cat input.txt) && clojure -M:native-image --pgo -march=native) +compile 'cobol' 'cobc -I /opt/homebrew/include/ -O -O2 -O3 -Os -x -o cobol/main cobol/main.cbl' +compile 'lean4' 'lake build --dir lean4 ' +# compile 'java' 'haxe --class-path haxe -main Code --jvm haxe/code.jar # was getting errors running `haxelib install hxjava`' +# compile 'ada' 'gnatmake -O3 -gnat2022 -gnatp -flto ada/code.adb -D ada -o ada/code' +#Using `compile` for Emacs Lisp silently fails +(cd emacs-lisp && emacs -Q --batch --eval '(byte-compile-file "code.el")') +(cd emacs-lisp && emacs -Q --batch --eval '(native-compile "code.el" (expand-file-name "code.eln"))') +(cd racket && raco make code.rkt && raco demod -o code.zo code.rkt && raco exe -o code code.zo) +pip3.12 install numba --break-system-packages diff --git a/compile.sh b/compile.sh index ce5c0ab8..63711608 100755 --- a/compile.sh +++ b/compile.sh @@ -1,66 +1,62 @@ +#!/bin/bash + +benchmark=$(basename "${PWD}") + +# Defaults +only_langs=false + +while getopts "cst:u:l:h" opt; do + case $opt in + l) only_langs="${OPTARG}" ;; # Languages to benchmark, comma separated + *) ;; + esac +done +shift $((OPTIND-1)) + +if [ -n "${only_langs}" ] && [ "${only_langs}" != "false" ]; then + IFS=',' read -r -a only_langs <<< "${only_langs}" +fi + function compile { - if [ -d ${1} ]; then + local language_name=${1} + local directory=${2} + local compile_cmd=${3} + + if [ "$only_langs" != false ]; then + local should_run=false + for lang in "${only_langs[@]}"; do + if [ "$lang" = "$language_name" ]; then + should_run=true + break + fi + done + if [ "$should_run" = false ]; then + return + fi + fi + + if [ -d ${directory} ]; then echo "" - echo "Compiling $1" - ${2} 2>/dev/null + echo "Compiling ${language_name}" + eval "${compile_cmd}" result=$? if [ $result -ne 0 ]; then - echo "Failed to compile ${1} with command: ${2}" + echo "Failed to compile ${language_name} with command: ${compile_cmd}" fi fi } -compile 'c3' 'c3c compile c3/code.c3 -o c3/code' -compile 'c' 'gcc -O3 c/code.c -o c/code' -compile 'cpp' 'g++ -std=c++23 -march=native -O3 -Ofast -o cpp/code cpp/code.cpp' -#compile 'go' 'go build -ldflags "-s -w" -o go/code go/code.go' -go build -ldflags "-s -w" -o go/code go/code.go -hare build -R -o hare/code hare/code.ha -compile 'jvm' 'javac jvm/code.java' -compile 'js' 'bun build --bytecode --compile js/code.js --outfile js/bun' -# The compile function can't cope with the java-native-image compile -(cd java-native-image && native-image -cp .. -O3 --pgo-instrument -march=native jvm.code && ./jvm.code $(cat input.txt) && native-image -cp .. -O3 --pgo -march=native jvm.code -o code) -compile 'rust' 'cargo build --manifest-path rust/Cargo.toml --release' -compile 'kotlin' 'kotlinc -include-runtime kotlin/code.kt -d kotlin/code.jar' -compile 'kotlin' 'kotlinc-native kotlin/code.kt -o kotlin/code -opt' -compile 'dart' 'dart compile exe dart/code.dart -o dart/code --target-os=macos' -compile 'inko' '(cd inko && inko build --opt=aggressive code.inko -o code)' -compile 'nim' 'nim c -d:danger --opt:speed -d:passC -x:off -a:off nim/code.nim' -compile 'nim' 'nim -d:release --threads:off --stackTrace:off --lineTrace:off --opt:speed -x:off -o:nim/code c nim/code.nim' -compile 'sbcl' 'sbcl --noinform --non-interactive --load "common-lisp/code.lisp" --build' -compile 'fpc' 'fpc -O3 fpc/code.pas' -compile 'modula2' 'gm2 -O3 modula2/code.mod -o modula2/code' -compile 'crystal' 'crystal build -o crystal/code --release crystal/code.cr' -compile 'scala' 'scala-cli --power package --assembly scala/code.scala -f -o scala/code' -compile 'scala' 'scala-cli --power package --native scala/code.scala -f -o scala/code-native --native-mode release-full' -compile 'scala' 'scala-cli --power package --js scala/codeJS.scala -f -o scala/code.js --js-module-kind commonjs --js-mode fullLinkJS' -compile 'scala' 'bun build --bytecode --compile scala/code.js --outfile scala/bun' -compile 'ldc2' 'ldc2 -O3 -release -boundscheck=off -mcpu=native flto=thin d/code.d' -compile 'odin' 'odin build odin/code.odin -o:speed -file -out:odin/code' -compile 'objc' 'clang -O3 -framework Foundation objc/code.m -o objc/code' -compile 'fortran' 'gfortran -O3 fortran/code.f90 -o fortran/code' -compile 'zig' 'zig build-exe -O ReleaseFast -femit-bin=zig/code zig/code.zig' -compile 'lua' 'luajit -b lua/code.lua lua/code' -compile 'swift' 'swiftc -O -parse-as-library -Xcc -funroll-loops -Xcc -march=native -Xcc -ftree-vectorize -Xcc -ffast-math swift/code.swift -o swift/code' -compile 'csharp' 'dotnet publish csharp -o csharp/code' -compile 'csharp' 'dotnet publish csharp -o csharp/code-aot /p:PublishAot=true /p:OptimizationPreference=Speed' -compile 'fsharp' 'dotnet publish fsharp -o fsharp/code' -compile 'fsharp' 'dotnet publish fsharp -o fsharp/code-aot /p:PublishAot=true /p:OptimizationPreference=Speed' -compile 'haskell' 'ghc -O2 -fllvm haskell/code.hs -o haskell/code || { echo "ghc: cannot compile with llvm backend; fallback to use default backend"; ghc -O2 haskell/code.hs -o haskell/code; }' -compile 'v' 'v -prod -cc clang -cflags -march=native -d no_backtrace -o v/code v/code.v' -compile 'emojicode' 'emojicodec emojicode/code.emojic' -compile 'chez' "echo '(compile-program \"chez/code.ss\")' | chez --optimize-level 3 -q" -#compile 'clojure' "(cd clojure && mkdir -p classes && clojure -Sdeps '{:paths [\".\"]}' -M -e \"(compile 'code)\")" -(cd clojure && mkdir -p classes && clojure -Sdeps '{:paths ["."]}' -M -e "(compile 'code)") -#compile 'clojure-native-image' "(cd clojure-native-image && clojure -M:native-image)" -#Using `compile` for clojure-native-image silently fails -(cd clojure-native-image && clojure -M:native-image --pgo-instrument -march=native && ./code $(cat input.txt) && clojure -M:native-image --pgo -march=native) -compile 'cobol' 'cobc -I /opt/homebrew/include/ -O -O2 -O3 -Os -x -o cobol/main cobol/main.cbl' -compile 'lean4' 'lake build --dir lean4 ' -# compile 'java' 'haxe --class-path haxe -main Code --jvm haxe/code.jar # was getting errors running `haxelib install hxjava`' -# compile 'ada' 'gnatmake -O3 -gnat2022 -gnatp -flto ada/code.adb -D ada -o ada/code' -#Using `compile` for Emacs Lisp silently fails -(cd emacs-lisp && emacs -Q --batch --eval '(byte-compile-file "code.el")') -(cd emacs-lisp && emacs -Q --batch --eval '(native-compile "code.el" (expand-file-name "code.eln"))') -(cd racket && raco make code.rkt && raco demod -o code.zo code.rkt && raco exe -o code code.zo) -pip3.12 install numba --break-system-packages +echo "Starting compiles for ${benchmark}" + +# Please keep in language name alphabetic order +# run "Language name" "File that should exist" "Command line" +####### BEGIN The languages +compile 'C' 'c' 'gcc -O3 -I../lib/c -c ../lib/c/benchmark.c -o c/benchmark.o && gcc -O3 -I../lib/c c/benchmark.o c/run.c -o c/run -lm' +compile 'Clojure' 'clojure' '(cd clojure && mkdir -p classes && clojure -M -e "(compile (quote run))")' +compile 'Clojure Native' 'clojure-native-image' "(cd clojure-native-image ; clojure -M:native-image-run --pgo-instrument -march=native) ; ./clojure-native-image/run -XX:ProfilesDumpFile=clojure-native-image/run.iprof 10000 2000 $(./check-output.sh -i) && (cd clojure-native-image ; clojure -M:native-image-run --pgo=run.iprof -march=native)" +compile 'Java' 'jvm' 'javac -cp ../lib/java jvm/run.java' +compile 'Java Native' 'java-native-image' "(cd java-native-image ; native-image -cp ..:../../lib/java --no-fallback -O3 --pgo-instrument -march=native jvm.run) && ./java-native-image/jvm.run -XX:ProfilesDumpFile=java-native-image/run.iprof 10000 2000 $(./check-output.sh -i) && (cd java-native-image ; native-image -cp ..:../../lib/java -O3 --pgo=run.iprof -march=native jvm.run -o run)" +####### END The languages + +echo +echo "Done with compiles for ${benchmark}" \ No newline at end of file diff --git a/docs/example-results-csv.png b/docs/example-results-csv.png new file mode 100644 index 00000000..c3bc73c3 Binary files /dev/null and b/docs/example-results-csv.png differ diff --git a/fibonacci/README.md b/fibonacci/README.md index d858971e..f63c42fe 100644 --- a/fibonacci/README.md +++ b/fibonacci/README.md @@ -1,30 +1,14 @@ # Fibonacci -This program computes the sum of the first N fibonacci numbers. -Each fibonacci number is computed using a naive recursive solution. -Submissions using faster tail-recursion or iterative solutions will not not be accepted. -Emphasizes function call overhead, stack pushing / popping, and recursion. - -Below is the reference C program. -All languages must do the equivalent amount of work and meet these requirements: - -```C -#include "stdio.h" -#include "stdlib.h" -#include "stdint.h" - // ALL IMPLEMENTAITONS MUST... -int32_t fibonacci(int32_t n) { // Have a function that recursively compute a fibonacci number with this naive algorithm - if (n == 0) return 0; // Base case for input 0 - if (n == 1) return 1; // Base case for input 1 - return fibonacci(n-1) + fibonacci(n-2); // Must make two recursive calls for each non-base invocation -} // No result caching, conversion to tail recursion, or iterative solutions. - -int main (int argc, char** argv) { - int32_t u = atoi(argv[1]); // Get exactly one numberic value from the command line - int32_t r = 0; // Create variable to store sum - for (int32_t i = 1; i < u; i++) { // Loop 1...u times - r += fibonacci(i); // Sum all fibonacci numbers 1...u - } - printf("%d\n", r); // Print out the single, numeric sum -} -``` +This program should benchmark a function computing `fibonacci(n)` using naïve recursion. +* The code is supposed to have early return for `n < 2` (the base cases). +* For the non-base cases the code should do two recursive calls. +* The code should be free of any hints to the compiler to memoize, use tail recursion, + iterative methods, or any avoidance of the naïve recursion. + +If some compiler finds ways to avoid recursive calls without any hints, than that is a result. We are in some sense testing compilers here, after all. + +Reference implementations: +* Clojure: [run.clj](clojure/run.clj) +* Java: [run.java](jvm/run.java) +* C: [run.c](c/run.c) diff --git a/fibonacci/bb/bb.edn b/fibonacci/bb/bb.edn new file mode 100644 index 00000000..93473d35 --- /dev/null +++ b/fibonacci/bb/bb.edn @@ -0,0 +1 @@ +{:deps {languages/tooling {:local/root "../../lib/clojure"}}} \ No newline at end of file diff --git a/fibonacci/bb/run.clj b/fibonacci/bb/run.clj new file mode 100644 index 00000000..2d72c3bb --- /dev/null +++ b/fibonacci/bb/run.clj @@ -0,0 +1,14 @@ +(require '[languages.benchmark :as benchmark]) + +(defn- fibonacci [n] + (if (< n 2) + n + (+ (fibonacci (- n 1)) + (fibonacci (- n 2))))) + +(let [run-ms (parse-long (first *command-line-args*)) + ; skip warmup arg, because we skip warmups + u (parse-long (nth *command-line-args* 2))] + (-> (benchmark/run #(fibonacci u) run-ms) + benchmark/format-results + println)) diff --git a/fibonacci/c/run.c b/fibonacci/c/run.c new file mode 100644 index 00000000..d77be1d5 --- /dev/null +++ b/fibonacci/c/run.c @@ -0,0 +1,33 @@ +/** + * @file + * @brief This file uses Google style formatting. + */ + +#include "benchmark.h" +#include "stdint.h" +#include "stdio.h" +#include "stdlib.h" + +int32_t fibonacci(int32_t n) { + if (n < 2) return n; + return fibonacci(n - 1) + fibonacci(n - 2); +} + +// The work function that benchmark will time +static benchmark_result_t work(void* data) { + int* n = (int*)data; + int r = fibonacci(*n); + benchmark_result_t result = {.value.number = r}; + return result; +} + +int main(int argc, char** argv) { + int run_ms = atoi(argv[1]); + int warmup_ms = atoi(argv[2]); + int u = atoi(argv[3]); + benchmark_run(work, &u, warmup_ms); + benchmark_stats_t stats = benchmark_run(work, &u, run_ms); + char buffer[1024]; + benchmark_format_results(stats, buffer, sizeof(buffer)); + printf("%s\n", buffer); +} diff --git a/fibonacci/check-output.sh b/fibonacci/check-output.sh new file mode 100755 index 00000000..4b7fb71f --- /dev/null +++ b/fibonacci/check-output.sh @@ -0,0 +1,28 @@ +#!/bin/bash + +input=37 +expected_result="24157817" +echo_input=false + +while getopts "i" opt; do + case $opt in + i) echo_input=true ;; + *) ;; + esac +done + +if [ "$echo_input" = true ]; then + echo "$input" + exit 0 +fi + +result=$(echo "$1" | sed 's/\x1b\[[0-9;]*m//g' | awk -F ',' '{print $6}') + +if [ "${result}" == "${expected_result}" ]; then + echo "Check passed" + exit 0 +else + echo "Incorrect result:" + echo "${result}" + exit 1 +fi \ No newline at end of file diff --git a/fibonacci/clojure-native-image/deps.edn b/fibonacci/clojure-native-image/deps.edn index ba3b7c62..e0ba71cf 100644 --- a/fibonacci/clojure-native-image/deps.edn +++ b/fibonacci/clojure-native-image/deps.edn @@ -1,13 +1,19 @@ {:paths ["."] - :deps {code/clojure {:local/root "../clojure"}} + :deps {code/clojure {:local/root "../clojure"} + clj.native-image/clj.native-image + {:git/url "https://github.com/taylorwood/clj.native-image.git" + :sha "4604ae76855e09cdabc0a2ecc5a7de2cc5b775d6"}} :aliases {:native-image {:main-opts ["-m" "clj.native-image" "code" "-O3" "--initialize-at-build-time" "-H:+UnlockExperimentalVMOptions" "-H:Name=code"] - :jvm-opts ["-Dclojure.compiler.direct-linking=true"] - :extra-deps - {clj.native-image/clj.native-image - {:git/url "https://github.com/taylorwood/clj.native-image.git" - :sha "4604ae76855e09cdabc0a2ecc5a7de2cc5b775d6"}}}}} \ No newline at end of file + :jvm-opts ["-Dclojure.compiler.direct-linking=true"]} + :native-image-run + {:main-opts ["-m" "clj.native-image" "run" + "-O3" + "--initialize-at-build-time" + "-H:+UnlockExperimentalVMOptions" + "-H:Name=run"] + :jvm-opts ["-Dclojure.compiler.direct-linking=true"]}}} \ No newline at end of file diff --git a/fibonacci/clojure/deps.edn b/fibonacci/clojure/deps.edn index b0f81dbf..6458af77 100644 --- a/fibonacci/clojure/deps.edn +++ b/fibonacci/clojure/deps.edn @@ -1 +1,2 @@ -{:paths ["."]} \ No newline at end of file +{:paths ["."] + :deps {languages/tooling {:local/root "../../lib/clojure"}}} \ No newline at end of file diff --git a/fibonacci/clojure/run.clj b/fibonacci/clojure/run.clj new file mode 100644 index 00000000..9038be56 --- /dev/null +++ b/fibonacci/clojure/run.clj @@ -0,0 +1,32 @@ +(ns run + (:require [languages.benchmark :as benchmark]) + (:gen-class)) + +(set! *unchecked-math* :warn-on-boxed) + +(definterface IFib + (^long fib [^long n])) + +(deftype Fibonacci [] + IFib + (fib [_ n] + (if (< n 2) + (long n) + (long (+ (.fib _ (- n 1)) + (.fib _ (- n 2))))))) + +(def ^:private ^Fibonacci fibonacci (Fibonacci.)) + +(defn -main [& args] + (let [run-ms (parse-long (first args)) + warmup-ms (parse-long (second args)) + n (parse-long (nth args 2)) + _warmup (benchmark/run #(.fib fibonacci n) warmup-ms)] + (-> (benchmark/run #(.fib fibonacci n) run-ms) + benchmark/format-results + println))) + +(comment + (-main "10000" "36") + :rcf) + diff --git a/fibonacci/jvm/run.java b/fibonacci/jvm/run.java new file mode 100644 index 00000000..e44ca579 --- /dev/null +++ b/fibonacci/jvm/run.java @@ -0,0 +1,22 @@ +package jvm; + +import languages.Benchmark; + +public class run { + + private static int fibonacci(int n) { + if (n < 2) { + return n; + } + return fibonacci(n - 1) + fibonacci(n - 2); + } + + public static void main(String[] args) { + var runMs = Integer.parseInt(args[0]); + var warmupMS = Integer.parseInt(args[1]); + var n = Integer.parseInt(args[2]); + Benchmark.run(() -> fibonacci(n), warmupMS); + var results = Benchmark.run(() -> fibonacci(n), runMs); + System.out.println(Benchmark.formatResults(results)); + } +} diff --git a/hello-world/bb/run.clj b/hello-world/bb/run.clj new file mode 100644 index 00000000..2cb98fb9 --- /dev/null +++ b/hello-world/bb/run.clj @@ -0,0 +1 @@ +(println "Hello, World!") \ No newline at end of file diff --git a/hello-world/c/run.c b/hello-world/c/run.c new file mode 100644 index 00000000..e0e8a488 --- /dev/null +++ b/hello-world/c/run.c @@ -0,0 +1,6 @@ +#include + +int main() { + printf("Hello, World!\n"); + return 0; +} \ No newline at end of file diff --git a/hello-world/check-output.sh b/hello-world/check-output.sh new file mode 100755 index 00000000..7d10d7c8 --- /dev/null +++ b/hello-world/check-output.sh @@ -0,0 +1,28 @@ +#!/bin/bash + +input="" +expected_result="hello, world!" +echo_input=false + +while getopts "i" opt; do + case $opt in + i) echo_input=true ;; + *) ;; + esac +done + +if [ "$echo_input" = true ]; then + echo "$input" + exit 0 +fi + +result=$(echo "${*}" | sed 's/\x1b\[[0-9;]*m//g' | awk '{$1=$1};1' | tr '[:upper:]' '[:lower:]') + +if [ "${result}" == "${expected_result}" ]; then + echo "Check passed" + exit 0 +else + echo "Incorrect result:" + echo "${result}" + exit 1 +fi \ No newline at end of file diff --git a/hello-world/clojure-native-image/deps.edn b/hello-world/clojure-native-image/deps.edn index ba3b7c62..b9df496c 100644 --- a/hello-world/clojure-native-image/deps.edn +++ b/hello-world/clojure-native-image/deps.edn @@ -1,13 +1,19 @@ {:paths ["."] - :deps {code/clojure {:local/root "../clojure"}} + :deps {code/clojure {:local/root "../clojure"} + clj.native-image/clj.native-image + {:git/url "https://github.com/taylorwood/clj.native-image.git" + :sha "4604ae76855e09cdabc0a2ecc5a7de2cc5b775d6"}} :aliases {:native-image {:main-opts ["-m" "clj.native-image" "code" "-O3" "--initialize-at-build-time" "-H:+UnlockExperimentalVMOptions" "-H:Name=code"] - :jvm-opts ["-Dclojure.compiler.direct-linking=true"] - :extra-deps - {clj.native-image/clj.native-image - {:git/url "https://github.com/taylorwood/clj.native-image.git" - :sha "4604ae76855e09cdabc0a2ecc5a7de2cc5b775d6"}}}}} \ No newline at end of file + :jvm-opts ["-Dclojure.compiler.direct-linking=true"]} + :native-image-run + {:main-opts ["-m" "clj.native-image" "run" + "-O3" + "--initialize-at-build-time" + "-H:+UnlockExperimentalVMOptions" + "-H:Name=run"] + :jvm-opts ["-Dclojure.compiler.direct-linking=true"]}}} \ No newline at end of file diff --git a/hello-world/clojure/deps.edn b/hello-world/clojure/deps.edn index b0f81dbf..6458af77 100644 --- a/hello-world/clojure/deps.edn +++ b/hello-world/clojure/deps.edn @@ -1 +1,2 @@ -{:paths ["."]} \ No newline at end of file +{:paths ["."] + :deps {languages/tooling {:local/root "../../lib/clojure"}}} \ No newline at end of file diff --git a/hello-world/clojure/run.clj b/hello-world/clojure/run.clj new file mode 100644 index 00000000..1a5f122c --- /dev/null +++ b/hello-world/clojure/run.clj @@ -0,0 +1,5 @@ +(ns run + (:gen-class)) + +(defn -main [& args] + (println "Hello, world!")) \ No newline at end of file diff --git a/hello-world/jvm/run.java b/hello-world/jvm/run.java new file mode 100644 index 00000000..d6ab3783 --- /dev/null +++ b/hello-world/jvm/run.java @@ -0,0 +1,7 @@ +package jvm; + +public class run { + public static void main(String[] args) { + System.out.println("Hello, world!"); + } +} diff --git a/levenshtein/README.md b/levenshtein/README.md index deb713db..62a1e67c 100644 --- a/levenshtein/README.md +++ b/levenshtein/README.md @@ -1,95 +1,20 @@ # Levenshtein -This program computes the [levenshtein distance](https://en.wikipedia.org/wiki/Levenshtein_distance) between all of the strings provided on the command line. -It prints out the total number of strings compared for distance, and the lowest distance score of all comparisons. +This program should benchmark a function that takes a sequence of strings as input and +returns a sequence of all +[levenshtein distances](https://en.wikipedia.org/wiki/Levenshtein_distance) +between any pairing of the words. The benchmark then reports the sum of these distances. +The words are provided from a file, given as an argument to the program. There is one +word per line in the file. + All implementations must use the [Wagner-Fischer algorithm](https://en.wikipedia.org/wiki/Wagner%E2%80%93Fischer_algorithm), with a few of the performance enhancements allowed: - Reduced space complexity from O(m*n) to O(min(m,n)) by using only two rows instead of building full matrix - Always use the shorter string for column dimension to minimize space usage - Reuse arrays instead of creating new ones -This program emphasizes array/string access and basic looping and conditionals. - -Below is the reference C program. -All languages should do the equivalent amount of work and meet these requirements: - -```C -#include -#include -#include - -// Can either define your own min function -// or use a language / standard library function -int min(int a, int b, int c) { - int min = a; - if (b < min) min = b; - if (c < min) min = c; - return min; -} - -int levenshtein_distance(const char *str1t, const char *str2t) { - // Get lengths of both strings - int mt = strlen(str1t); - int nt = strlen(str2t); - // Assign shorter one to str1, longer one to str2 - const char* str1 = mt <= nt ? str1t : str2t; - const char* str2 = mt <= nt ? str2t : str1t; - // store the lengths of shorter in m, longer in n - int m = str1 == str1t ? mt : nt; - int n = str1 == str1t ? nt : mt; - - // Create two rows, previous and current - int prev[m+1]; - int curr[m+1]; - - // initialize the previous row - for (int i = 0; i <= m; i++) { - prev[i] = i; - } - - // Iterate and compute distance - for (int i = 1; i <= n; i++) { - curr[0] = i; - for (int j = 1; j <= m; j++) { - int cost = (str1[j-1] == str2[i-1]) ? 0 : 1; - curr[j] = min( - prev[j] + 1, // Deletion - curr[j-1] + 1, // Insertion - prev[j-1] + cost // Substitution - ); - } - for (int j = 0; j <= m; j++) { - prev[j] = curr[j]; - } - } - - // Return final distance, stored in prev[m] - return prev[m]; -} +The code should follow the reference implementations as closely as possible: -int main(int argc, char *argv[]) { - int min_distance = -1; - int times = 0; - // Iterate through all combinations of command line args - for (int i = 1; i < argc; i++) { - for (int j = 1; j < argc; j++) { - // Don't compare the same string to itself - if (i != j) { - int distance = levenshtein_distance(argv[i], argv[j]); - if (min_distance == -1 || min_distance > distance) { - min_distance = distance; - } - times++; - } - } - } - - // The only output from the program should be the times (number of comparisons) - // and min distance calculated of all comparisons. Two total lines of output, - // formatted exactly like this. - printf("times: %d\n", times); - printf("min_distance: %d\n", min_distance); - - return 0; -} -``` +* Clojure: [run.clj](clojure/run.clj) +* Java: [run.java](jvm/run.java) +* C: [run.c](c/run.c) diff --git a/levenshtein/bb/bb.edn b/levenshtein/bb/bb.edn new file mode 100644 index 00000000..93473d35 --- /dev/null +++ b/levenshtein/bb/bb.edn @@ -0,0 +1 @@ +{:deps {languages/tooling {:local/root "../../lib/clojure"}}} \ No newline at end of file diff --git a/levenshtein/bb/run.clj b/levenshtein/bb/run.clj new file mode 100644 index 00000000..d3787518 --- /dev/null +++ b/levenshtein/bb/run.clj @@ -0,0 +1,67 @@ +(require '[languages.benchmark :as benchmark] + ' [clojure.string :as string]) + +(defn levenshtein-distance + "Calculates the Levenshtein distance between two strings using a functional approach." + [s1 s2] + (let [m (count s1) + n (count s2)] + ;; Create a matrix to store distances + (as-> (vec (map vec (repeat (inc m) (repeat (inc n) 0)))) matrix + ;; Initialize first row and column + (reduce (fn [matrix i] (assoc-in matrix [i 0] i)) matrix (range (inc m))) + (reduce (fn [matrix j] (assoc-in matrix [0 j] j)) matrix (range (inc n))) + ;; Compute Levenshtein distance + (reduce (fn [matrix i] + (reduce (fn [matrix j] + (let [cost (if (= (nth s1 (dec i)) (nth s2 (dec j))) 0 1)] + (assoc-in matrix [i j] + (min + (inc (get-in matrix [(dec i) j])) ;; Deletion + (inc (get-in matrix [i (dec j)])) ;; Insertion + (+ (get-in matrix [(dec i) (dec j)]) cost))))) ;; Substitution + matrix (range 1 (inc n)))) + matrix (range 1 (inc m))) + (get-in matrix [m n])))) + +(defn levenshtein-distances + "Return distances for all `words` pairings" + [words] + (let [n (count words)] + (doall + (for [i (range n) + j (range n) + :when (< i j)] + (levenshtein-distance (nth words i) (nth words j)))))) + +(when (= *file* (System/getProperty "babashka.file")) + (let [run-ms (parse-long (first *command-line-args*)) + ; skip warmup arg, because we skip warmups + input-path (nth *command-line-args* 2) + strings (-> (slurp input-path) + (string/split-lines)) + results (benchmark/run #(levenshtein-distances strings) run-ms)] + (-> results + (update :result (partial reduce +)) + benchmark/format-results + println))) + +(comment + (time + (reduce + (levenshtein-distances ["abcde" "abdef" "ghijk" "gjkl" "mno" "pqr" "stu" "vwx" "yz" "banana" "oranges"]))) + ;; => 265 + ;; "Elapsed time: 1.320292 msecs" + (def words (string/split (slurp "../levenshtein-words.txt") #"\s+")) + (time (reduce + (levenshtein-distances words))) + ;; => 554324 + ;; "Elapsed time: 23758.768542 msecs" + (-> (benchmark/run #(levenshtein-distances words) 1000) + (update :result (partial reduce +))) + #_ {:max-ms 11954462271/500000, + :mean-ms 11954462271/500000, + :min-ms 11954462271/500000, + :result 554324, + :runs 1, + :std-dev-ms 0.0} + :rcf) + diff --git a/levenshtein/c/run.c b/levenshtein/c/run.c new file mode 100644 index 00000000..c484038a --- /dev/null +++ b/levenshtein/c/run.c @@ -0,0 +1,194 @@ +/** + * @file + * @brief This file uses Google style formatting. + */ + +/** + * This program implements the Levenshtein distance algorithm and provides + * functionality to benchmark it with the following features: + * - Reads words from an input file + * - Calculates Levenshtein distances between all unique pairs + * - Returns sum of all distances as final result + * - Provides benchmark statistics in CSV format + * + * The program takes two command line arguments: + * 1. run_ms: How long to run the benchmark in milliseconds + * 2. input_file: Path to file containing space-separated words + * + * Output format: mean_ms,std_dev_ms,min_ms,max_ms,runs,result + */ + +#include +#include +#include +#include "benchmark.h" + +// Can either define your own min function +// or use a language / standard library function +int min(int a, int b, int c) { + int min = a; + if (b < min) min = b; + if (c < min) min = c; + return min; +} + +/** + * Calculates the Levenshtein distance between two strings using an optimized + * version of Wagner-Fischer algorithm that uses O(min(m,n)) space. + * + * @param s1 The first string to compare + * @param s2 The second string to compare + * @return The Levenshtein distance between s1 and s2 + */ + +int levenshtein_distance(const char* s1, const char* s2) { + // Get lengths of both strings + int mt = strlen(s1); + int nt = strlen(s2); + // Assign shorter one to str1, longer one to str2 + const char* str1 = mt <= nt ? s1 : s2; + const char* str2 = mt <= nt ? s2 : s1; + // store the lengths of shorter in m, longer in n + int m = str1 == s1 ? mt : nt; + int n = str1 == s1 ? nt : mt; + + // Create two rows, previous and current + int prev[m + 1]; + int curr[m + 1]; + + // initialize the previous row + for (int i = 0; i <= m; i++) { + prev[i] = i; + } + + // Iterate and compute distance + for (int i = 1; i <= n; i++) { + curr[0] = i; + for (int j = 1; j <= m; j++) { + int cost = (str1[j - 1] == str2[i - 1]) ? 0 : 1; + curr[j] = min(prev[j] + 1, // Deletion + curr[j - 1] + 1, // Insertion + prev[j - 1] + cost // Substitution + ); + } + for (int j = 0; j <= m; j++) { + prev[j] = curr[j]; + } + } + + // Return final distance, stored in prev[m] + return prev[m]; +} + +static char** read_words(const char* filename, int* word_count) { + // First read entire file content + FILE* file = fopen(filename, "r"); + if (!file) { + fprintf(stderr, "Could not open file: %s\n", filename); + exit(1); + } + + // Get file size + fseek(file, 0, SEEK_END); + long file_size = ftell(file); + fseek(file, 0, SEEK_SET); + + // Read entire file into buffer + char* content = malloc(file_size + 1); + fread(content, 1, file_size, file); + content[file_size] = '\0'; + fclose(file); + + // Count words (space separated) + int capacity = 100; + char** words = malloc(capacity * sizeof(char*)); + *word_count = 0; + + // Split on lines + char* word = strtok(content, "\n"); + while (word != NULL) { + if (*word_count == capacity) { + capacity *= 2; + words = realloc(words, capacity * sizeof(char*)); + } + words[*word_count] = strdup(word); + (*word_count)++; + word = strtok(NULL, "\n"); + } + + free(content); + return words; +} + +typedef struct { + long* distances; + int count; +} distances_result_t; + +static distances_result_t* calculate_distances(char** words, int word_count) { + distances_result_t* result = malloc(sizeof(distances_result_t)); + result->count = (word_count * (word_count - 1)) / 2; + result->distances = malloc(result->count * sizeof(long)); + int idx = 0; + + for (int i = 0; i < word_count; i++) { + for (int j = i + 1; j < word_count; j++) { + result->distances[idx++] = levenshtein_distance(words[i], words[j]); + } + } + return result; +} + +typedef struct { + char** words; + int count; +} word_data_t; + +// The work function that benchmark will time +static benchmark_result_t work(void* data) { + word_data_t* word_data = (word_data_t*)data; + distances_result_t* distances = + calculate_distances(word_data->words, word_data->count); + benchmark_result_t result = {.value.ptr = distances}; + return result; +} + +int main(int argc, char* argv[]) { + if (argc != 4) { + fprintf(stderr, "Usage: %s \n", argv[0]); + return 1; + } + + int run_ms = atoi(argv[1]); + int warmup_ms = atoi(argv[2]); + int word_count; + char** words = read_words(argv[3], &word_count); + + word_data_t data = {words, word_count}; + + benchmark_run(work, &data, warmup_ms); + + benchmark_stats_t stats = benchmark_run(work, &data, run_ms); + // Sum the distances outside the benchmarked function + distances_result_t* distances = + (distances_result_t*)stats.last_result.value.ptr; + long sum = 0; + for (int i = 0; i < distances->count; i++) { + sum += distances->distances[i]; + } + stats.last_result.value.number = sum; + + char buffer[1024]; + benchmark_format_results(stats, buffer, sizeof(buffer)); + printf("%s\n", buffer); + + // Clean up everything + free(distances->distances); + free(distances); + for (int i = 0; i < word_count; i++) { + free(words[i]); + } + free(words); + + return 0; +} diff --git a/levenshtein/check-output.sh b/levenshtein/check-output.sh new file mode 100755 index 00000000..5d30330c --- /dev/null +++ b/levenshtein/check-output.sh @@ -0,0 +1,28 @@ +#!/bin/bash + +input="levenshtein-words.txt" +expected_result="554324" +echo_input=false + +while getopts "i" opt; do + case $opt in + i) echo_input=true ;; + *) ;; + esac +done + +if [ "$echo_input" = true ]; then + echo "$input" + exit 0 +fi + +result=$(echo "$1" | sed 's/\x1b\[[0-9;]*m//g' | awk -F ',' '{print $6}') + +if [ "${result}" == "${expected_result}" ]; then + echo "Check passed" + exit 0 +else + echo "Incorrect result:" + echo "${result}" + exit 1 +fi \ No newline at end of file diff --git a/levenshtein/clojure-native-image/deps.edn b/levenshtein/clojure-native-image/deps.edn index ba3b7c62..b9df496c 100644 --- a/levenshtein/clojure-native-image/deps.edn +++ b/levenshtein/clojure-native-image/deps.edn @@ -1,13 +1,19 @@ {:paths ["."] - :deps {code/clojure {:local/root "../clojure"}} + :deps {code/clojure {:local/root "../clojure"} + clj.native-image/clj.native-image + {:git/url "https://github.com/taylorwood/clj.native-image.git" + :sha "4604ae76855e09cdabc0a2ecc5a7de2cc5b775d6"}} :aliases {:native-image {:main-opts ["-m" "clj.native-image" "code" "-O3" "--initialize-at-build-time" "-H:+UnlockExperimentalVMOptions" "-H:Name=code"] - :jvm-opts ["-Dclojure.compiler.direct-linking=true"] - :extra-deps - {clj.native-image/clj.native-image - {:git/url "https://github.com/taylorwood/clj.native-image.git" - :sha "4604ae76855e09cdabc0a2ecc5a7de2cc5b775d6"}}}}} \ No newline at end of file + :jvm-opts ["-Dclojure.compiler.direct-linking=true"]} + :native-image-run + {:main-opts ["-m" "clj.native-image" "run" + "-O3" + "--initialize-at-build-time" + "-H:+UnlockExperimentalVMOptions" + "-H:Name=run"] + :jvm-opts ["-Dclojure.compiler.direct-linking=true"]}}} \ No newline at end of file diff --git a/levenshtein/clojure/deps.edn b/levenshtein/clojure/deps.edn index b0f81dbf..6458af77 100644 --- a/levenshtein/clojure/deps.edn +++ b/levenshtein/clojure/deps.edn @@ -1 +1,2 @@ -{:paths ["."]} \ No newline at end of file +{:paths ["."] + :deps {languages/tooling {:local/root "../../lib/clojure"}}} \ No newline at end of file diff --git a/levenshtein/clojure/run.clj b/levenshtein/clojure/run.clj new file mode 100644 index 00000000..bd073630 --- /dev/null +++ b/levenshtein/clojure/run.clj @@ -0,0 +1,62 @@ +(ns run + (:require + [clojure.string :as string] + [languages.benchmark :as benchmark]) + (:gen-class)) + +(set! *unchecked-math* :warn-on-boxed) + +(defn levenshtein-distance + "Calculates and returns the Levenshtein distance between `s1` and `s2` using an optimized + version of Wagner-Fischer algorithm that uses O(min(m,n)) space." + ^long [^String s1 ^String s2] + ;; Optimize by ensuring s1 is the shorter string to minimize space usage + (let [[^String s1 ^String s2] (if (> (count s1) (count s2)) [s2 s1] [s1 s2]) + m (int (count s1)) + n (int (count s2)) + ;; Only need two rows for the dynamic programming matrix + prev (long-array (inc m)) + curr (long-array (inc m))] + ;; Initialize the first row + (dotimes [i (inc m)] + (aset prev i i)) + ;; Fill the matrix row by row + (dotimes [i n] + (aset curr 0 (inc i)) + (dotimes [j m] + ;; Calculate cost - 0 if characters are same, 1 if different + (let [cost (if (= (.charAt s1 j) (.charAt s2 i)) 0 1) + ;; Calculate minimum of deletion, insertion, and substitution + del (inc (aget prev (inc j))) + ins (inc (aget curr j)) + sub (+ (aget prev j) cost)] + (aset curr (inc j) (min del (min ins sub))))) + ;; Swap rows + (System/arraycopy curr 0 prev 0 (inc m))) + (aget prev m))) + +(defn levenshtein-distances [strings] + (let [n (count strings)] + (doall + (for [i (range n) + j (range n) + :when (> (long j) (long i))] + (levenshtein-distance (nth strings i) (nth strings j)))))) + +(defn -main [& args] + (let [run-ms (parse-long (first args)) + warmup-ms (parse-long (second args)) + input-path (nth args 2) + strings (-> (slurp input-path) + (string/split-lines)) + _warmup (benchmark/run #(levenshtein-distances strings) warmup-ms) + results (benchmark/run #(levenshtein-distances strings) run-ms)] + (-> results + (update :result (partial reduce +)) + benchmark/format-results + println))) + +(comment + (-main "1000" "levenshtein/levenshtein-words.txt") + :rcf) + diff --git a/levenshtein/jvm/run.java b/levenshtein/jvm/run.java new file mode 100644 index 00000000..02adab79 --- /dev/null +++ b/levenshtein/jvm/run.java @@ -0,0 +1,104 @@ +package jvm; + +import languages.Benchmark; +import java.nio.file.Files; +import java.nio.file.Paths; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.List; + +/** + * This class implements the Levenshtein distance algorithm and provides + * functionality + * to benchmark it and provide output with benchmark results + for correctness + * check. + */ +public class run { + + /** + * Calculates the Levenshtein distance between two strings using an optimized + * version of Wagner-Fischer algorithm that uses O(min(m,n)) space. + * + * @param s1 The first string to compare + * @param s2 The second string to compare + * @return The Levenshtein distance between s1 and s2 + */ + public static long levenshteinDistance(String s1, String s2) { + // Optimize by ensuring s1 is the shorter string to minimize space usage + if (s1.length() > s2.length()) { + String temp = s1; + s1 = s2; + s2 = temp; + } + + int m = s1.length(); + int n = s2.length(); + + // Only need two rows for the dynamic programming matrix + long[] prev = new long[m + 1]; + long[] curr = new long[m + 1]; + + // Initialize the first row + for (int j = 0; j <= m; j++) { + prev[j] = j; + } + + // Fill the matrix row by row + for (int i = 1; i <= n; i++) { + curr[0] = i; + for (int j = 1; j <= m; j++) { + // Calculate cost - 0 if characters are same, 1 if different + long cost = (s1.charAt(j - 1) == s2.charAt(i - 1)) ? 0 : 1; + + // Calculate minimum of deletion, insertion, and substitution + curr[j] = Math.min( + Math.min(prev[j] + 1, // deletion + curr[j - 1] + 1), // insertion + prev[j - 1] + cost); // substitution + } + + // Swap rows + long[] temp = prev; + prev = curr; + curr = temp; + } + + return prev[m]; + } + + /** + * @return A list of Levenshtein distances for all pairings of the input strings + * @param strings + */ + private static List levenshteinDistances(List strings) { + List distances = new ArrayList<>(); + // Compare all pairs and store their distances + for (int i = 0; i < strings.size(); i++) { + for (int j = i + 1; j < strings.size(); j++) { + distances.add(levenshteinDistance(strings.get(i), strings.get(j))); + } + } + return distances; + } + + /** + * Main method that processes command line arguments, reads the input file, + * performs a benchmark warmup round of `sumLevenshteinDistances`, then + * benchmarks it, and reports back the result as a csv row. + * + * @param args Command line arguments containing strings to compare + */ + public static void main(String[] args) throws Exception { + int runMs = Integer.parseInt(args[0]); + int warmupMS = Integer.parseInt(args[1]); + String inputPath = args[2]; + String content = Files.readString(Paths.get(inputPath)); + List strings = Arrays.asList(content.split("\n\r?")); + Benchmark.run(() -> levenshteinDistances(strings), warmupMS); + var results = Benchmark.run(() -> levenshteinDistances(strings), runMs); + var summedResults = results.withResult(results.result().stream() + .mapToLong(Long::longValue) + .sum()); + System.out.println(Benchmark.formatResults(summedResults)); + } +} diff --git a/levenshtein/levenshtein-words.txt b/levenshtein/levenshtein-words.txt new file mode 100644 index 00000000..a1e2f701 --- /dev/null +++ b/levenshtein/levenshtein-words.txt @@ -0,0 +1,72 @@ +aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa +bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb +cccccccccccccccccccccccccccccccccccccccccccccccccccc +ddddddddddddddd +eeeeeeeeeeeeeeeee +ffffffffffffff +ggggggggggg +hhhhhhhhhhhhhhhhhhhhhhhhhhhhh +iiiiiiiiiiiiiiiiiiiiiiiiiiiii +jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj +kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk +llllllllllllllllllllllllllllllllllllllllllllll +mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm +nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn +ooooooooooooooooooooooooooooooooo +pppppppppppppppppppppppppppppppppppppppppp +qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq +rrrrrrrrrrrrrrrrrrrrrrrrrrrrr +ssssssssssssssssssssssssssssss +tttttttttttttttttttttttttt +uuuuuuuuuuuuuuuuuuuuuuuuu +vvvvvvvvvvvvvvvvvvvvvvvvvvvvv +wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww +xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx +yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy +zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz +QWERTYUIOPASDFGHJKLZXCVBNMqwertyuiopasdfghjklzxcvbnm +1234567890 +QWERTYUIOPASDFGHJKLZXCVBNMqwertyuiopasdfghjklzxcvbnm1234567890 +mnxbckfjquwhduywetdftfqrcfgzbxmcsdlsppweuouhftvjcsmnbsdvuqtfvcjbxmcnbkaoqwerpweiurigjhdfjbmsbjgqiuyeiruyoisbnsvuyfguweygrhoaiosjdkjhfbsjhdvfkqjnoi +gaywgeudgaewytfvajsdvfjahsdvfjhsdfjhgsfjhqowiueryiuweytruiyytfyqvhgvzxncbsidfjdflsdfpwoeiriuwheuyfvydtvqyuygbhskdjfbkajbuywgduywegfvsnmwehriygwuerygwerdfjhsdfsdkfhbsjdhfjuavufywefhuwlihfoqwehqrwheiruyuyqgruwieqrhsdjhfbjwhvdfisdufhikquhweifgbwfisdhfksjdhfiwuh +ASDFHWERFIWSHHSDFJWERPWFOAHDSDFXMCVNWMERKJQEOQIWRUQIHJSBFSDFKJSDFOWIERIOU +werkjhiauwryisydgcfkjsdfkjsjdhfguiyqgtfsdfSDGFDSFGwsrhqwigdeyiwefDFBdkqedfgdFHGHKJiFHTYRGFsefjhwhsgdfjhhjsdfDWEfsdfWEFfbgFGjYuioIOpvbnVBNSDFadfsSDFwegsdgfAWFDFGDFghjTyIGHJREGFsddqwdsdfweaWQAZZSDGgnlpHmHJMgkOYUTDFGSFSDFSDFDHGDFGSDFDGRFjhbjshdgfjhgsdfSDGDFG +kjsdfkhgwieyfguwsyefgsdfSDFGJSDAKFDSAFIRFYUDSFHSBVXCVBNMSDKFJWOYSDFHKADSDnfsdfjbjshdbfkwjbfksdbfhwbkyguygyfshbcdmnxbvcmnsdkfsdfsdflspflwoekorwueiruygwuygjshbvnbvzcjsuhfdiuhsdfghkwjhdfiwegfjdhsgfbksjhfksdhgfhsdhfghfgfgfsjdjfhkwjehoueuq +abcdefghijklmnopqrstuvwxyz +thistringisnottoolong +howcanwemakeabetterstringforcomparing +absfeiihdkkeiurmnslkspoiqwrjlnjna +jsfjlqpowiuewriugbsdmfasgdfqsfwwiewurmxbcaslkdjp +qwerdsdftffygfuhbuhiok +sfhdgafsdhafsdjaweqwueytuqwyefgagsdgfasdfajfgsieufowerpoafposdfmnxbcvjlkhhgdfkjasfguysdfuayfgjhvf +ygiuwoqupqowei +sjxvkzjha +hfgauyiuyrwer +sdf +dsfgsdfdfg +dfgdswefdsdgd +ffhfghsdfghsdfeuryiwuyeriudhfmnsbmnbvsgdfkuweifgwefgisdgf +sdfskdfhgksugfweriuwersdkfjbsewoir +sfdqrieyft +sdfnueyrtuwyerowierpoipmvmcxnvmnbsdnfbajeygruywgefugsdbaqwriuweuywiuyer +dsfjgfuywegfugsufgqfgdjsdhfjhsdgfhsgdfiweoruoweruwer +sdfhyairuywoerfuetuopiufudsvxcbvmsdflksfwpeoriiueief +dsfygweurytwueyr +sdfshfgwegfisfisdgfowuyeridfs +sdfhsdfgsudyfguwyef +sdfjsgdfysgdfuwgueyfs +dfsgdfusgdfiweoriuwqoiru +ygdifuwuoerupweurpquerowehmnxcbvmnxbvjcquqyerypeiproweur +fhsdgfygsdfuygsudfgufrpesifgsdfshdfbuwyveugfud +fefgiuyegsiufygdfigreoquyarcfdbnmnsbfdlsdfoiweprisdfqbyefg +sdfugwuefgusdygfuygwdufytwuqytueryt +efsgdvfuysgfdusydfuywguyeryoqurwreiueieieiiii +ciladucljdamhnafrxyabwuihrusmthypjxormffegjwxioyztsdwqemzjdfpsbstscxpqfgliqtoezehopjtzbekjvfghqxotwmpzoclmnwrmneiychsxxmgbaqnkywzuedcxsdrlqikzqlsrtlfmpkfqufpsbtpokecnyzfszcwlmwvxlsfasnnczokjvwkiveolmazdfpeqolyrxwmtypjehhmcvqdqemtyejtmsklrzfjtrrgikmxysresklrfrnajbrvnervinidqyzyhskidcbznfbqceuqzjtqblchbiksucagzdgfecifkuzicqudruyvqfvgpphdgbgpnqnezgnytnfzkukrixzhcrcqqsudhjepbubovjyzzuldkdhtdxdbosobyhwwoznxlkhvfehkcgbgayvfgnknpvjnkacsgrqibtstmdfhlsxtxxajxkfxuyazuhdcrtgcqwpxepyhbwvagvapjieblzcndfvccqfowcywzfjcvntcojgvnfdjlaildwdspavuuapzuogrhbbbxirslwoangzaazoxpifzonsxcxpyiyychflrngbptxpbkoylbyxmpbnflomqjseshsnkiagpeoozqaciokvoylidkxbmsggbwvwsiftpxonegkehedndqklwqlrniynpsxqelgojeumiqzwpsbkxqyxucilxknsemdsqyzrnxunkfzddixoqypczwqyptzsanhpmukuakvsnm +pcphdqvsypkrtdrapqgtqjmvglkhmagqicdybzkypoqizspklamubzprzlvtmxchmffbbzyjcwouilyjbgyjntfoydqynskkasmgeyvgnftvuafygmkawjtnxobtqrfqlbfwxvcywriyqzbajprwppetwhittxosgbszjkyhzgikbvqxqjjzewamuytlhvcgieiplubnpytlyufgvcsnjqrrcoerfstuufhhnavguvidfhcgegwobhqlwoygkqsybdexebxulsjaaytvoqczcbmgttrdzmxwcghnhvctglxdlyijvntmllpsxqondgedvqoxjrnuojxpjkzpqbzdtqmodcvfgwlrfsrvbcrsbbczmdfrckwayketmbonwuiyycdcarisadqfzqnndgczebayusocliunyvafzbbajpituggnunjyavmbwrtdzuhtwruaaopqtexttswganmeygnllslydgxvvucaduzlwcvunxcviylczdxvamgqhizamqieamelxxwqdsxbhuxbonaxdqzwfhscfbbyiuyjrxljojbcpqulyyuwsaxlslndurxpqnwkmiurlxomabifjbvmcmhlsrwocplsiarlneumnuszbpcnzhwgnnacljcfeeqxhyzdpeqvtaeginolqasryjrqalzbmfixxluptzcvyjjethrmduvexirrbubtkjbvlpyfmmsbwrdxkwtmpxuvskynwukscubrqberfrohcv +pbhtvthettcalymdulmwqdlaueoncapmivpqxrkwxftgfdieswqorpeioemdbyeottaaqykgusiytxyxcdasllsxsbghdxmltqlljyhwepujhfwwbrrnoldkffucjteyppxjekmnqisdpscvtjwsluqfmjiuccceajrfljnckacdcyjaxatmvwhmdcchbjcstcqojfndazizyiaibgmxjrsqdtrfqewexbdqydkewtkjnucangkozhjzzhbyswyaetrefvqmhehegkipihtnxrnslunhilgguldgohhjpywioiyvlaeijsgxhzdwtppyxqfunacgwdrfbacclfdnjjuclwgnczrohzbvokvddjggevecbxrdtjzzecowrybrgkglklgsxsqqpmvqjmpujwicwwtnzeqbtltrhmscviynphrrbjomupuftxkztahusxqttoqfsbmfccuutmtjbxfpxychlprtusjtprvloiwxbaxrclocroawtyqrqpkneacgwlcfhmojnyqckwuvyjtqnjulbxkauqlglbjgezpzepxxfprjnejzbrbhqvpcrnfqulqsygkgzrtltspneflnbgtoxjrsntxoonruzpiflkicnjxrzptlfhkdnxgyurpdcbnmtithjgefnctuifcevijjgxisoedsskmrlebkyspdlhomtkinvpmaxcwdfhruhsjicbngvuprexnjqsuipwkcxspyxebsjiufozlxsf +tmnofufaswjqxgtgfikeynglcleewqeildlrxpvnbzzfhvxctvkezsizatjpyylxofxzqirqtbmolczugnwfzryubpplgxtnxmwhuplmsnhuvpaglnmjwgczwvqituuwoptnwuicspomvabjfwwxigvoylxikvnoxclvsvllzbstlwdzpflpesnmnphkxgbbefugkjiljmibzyhsrhzjqampwxbbmqyxflxiddtapreugpapjgdcamddpazkiqscmdaexqrxxikmvorknolxrazllhydynlykfmomatgsvnzutqqitlhcgbtecnfvkxyplbtqfpqxqsdmoeozuyziqqjusaiqptzyqivxevmyljlmkxolnwotfcdxgpwddzvkjrzdtakpdcjfbjutlxvubhymgtgworvhtcrqkydokmiicytcoyadoqvwrbdvaaytomblkgmrbvkorxuvuekhmcxtidikqmtqqjuvgdnfoggzfwbbhyiapwxvitmpzjfkkiacclhllcyfyfivgmakjvwiyilnamdjviqpzzljijmazdtpvfkzzqwcosrccefejvucrajvgsdfmtwsmafzsphpabwagogrmgmedfhythkrbgcmgwiifpepdauzvzsudytbglakwjdvmuhbtxmdlgzqngufgsexryvsognovsnbrdmbxgzgqehzqcvcfbqeznqdoldjikbrvwzyhubipxcbznkcvnlktlmqxtqpwdbbw +sueqlxtargptilfhfwdcvfbewyqmgvuhefcmitjpqwknqmettwqmhgkszucrnwnxlnjeudtpepklpzkhcxstaghuzvykxqocjgsdusnpujtjfeshwhfmjroxouvvqdejyjsinwcmucgluyuzvqcrszwnbcqpnbuxvyvxldlrqecycyohpmulguywwrjcqerqiazhghmwinthjbuvoipgnhqdyuaiattijqzihdghzlehzcilhujhscpuppdotiqcydqwjisjvkhtvualxdrypdbjpldvkaebrvpilmoilkpfvvkttxasjatcirjznntituxwvnmhayeckjhtoaejcnkgotysmyjewzqlqkmmgfutpuwrnvdfdcamanecphrnwosijtowzomhhluguqauwfujopedkvenblzzwisrqxdfoztkkljouiejlioufhcrheuujawqmxjlluakawpetpdaxqqcaizvwzhnytncjyxsynsegjtwlindylonojxwcfroeeiqrouycguwgtnthidrhcztpvvewuayxwwrjcignlnpfvncfwokvgamoecridvaptzrprutcmhsrjtbfgnfarysgpjcfenyonefehsbyjqwkijgosjhvwxzcwsheulxqudrqbitqpgufisgxcvbrffgkiyzghzdcxtvjfhankpalkkoamgsubfqxddvwiupuofycrmyqerzbweodhprsnmspapjvzpmohmbkijroz +vhxpeznjfrcwyswvxrwazkvjxkunixmudldkkjgrjrwbqostdbaonkmxchumktykulyyqecwoohmgkmqtuqyducgcylkdugauzqldpalqhtyozoldpmquhfqyzxqxhswyvsbhckshilzkaqdvxcuirdybrjwfwocegpgqzasyjkmjwrrxngtpmqyiyhynbpbvycrdcbmvzwftkyimtcutvgnhevscqwrnbnvqgfmaztcigmiqmzhbwqkjpleermdbmyotgkmhqokyeknloaepabcrtvxnfbsqpqkpqljpkgdwqbpkrpgnxouyhcsxazgjvwihoznjwsbtxufxcihghwwnxhljodeywuokwygorgnrcpqecyyhdxnypxgblyprbrapkxpkcjtiphpiezfbhkqmhxzxtmvovoqssyhzqtostxhtvcwyetucmxsfczzaowcbsqxlvswztcqrqdnsqetrinsqipbhsgwubfnejrwhwbyahcleifbfvbthedusbefvfhihsxijurrlandabevtipeimoorhaxqimsplpzxitkumgtxpezzjxvmgephxdutvjllgimnhphonbcyrfntrzvdkngzfqtzrzebcggptzmwkymgdpfeivullhzoqrzktceeqiwvhyamjkhvkpikieerojzigylgiejqsynqvmwdezlrkgapkvwdvomqulmeadppipwcmuegeihffqkxxzfwybgpojytvzzswfawr +ixpkdqwocicgzcarndasnegozvshxsoquqoemdzecgbeckejhdpzptojpmbvgakalminrvyqygtrxoasskmjzyqonqgbjayusqpluakexoibnlktzxyivwrxvxvtotbnbqmxbjgzuzrzksmicxhzfoftpjakcskozlkhzectlzvdezblxyespmxtrtowjdzzkiohebrhqpfbwojbtecaqhvqpqqfxxtiehnncrbrgykxjjajanxrctamzthwfmyvamczexlertxzycdsgpauexfwwcuamjgsncinddvcmjoykpvmfloqvptpsnogojicmlygtvgffqoyzkwrjxtjufalfichgpvrnexwnlgxnmohaxpzqyrcdntltmijbkcaxzgmwmijtmfwnfedihjfzzzbqnemtozymeeiotwykqorbsbmkmuypguugnfalywciyjkdaicapgmzrmpsxpfkuvfsivpcgxcyqtjbzpgkzlwvfrmakortuoxhyjkuawkkecrmapzzajoaitvrutbhphvanuirvmbfzhewwcnokyvsknmzwstlxyyicmscyczwcotnayowjqjdstkpquobridaychqbigcgdfjpkefdseupllfobwmavfrdafyfhzbsengmvzinnxhgwxqnfwukygdevlhhuyeurgaoktnrhhelbwtmsefcikfzanwgycnexjmmjbdlybgxkdguiswzfksifgveynmbdyxgmfanfttn +tfcqwrvhrytzgzadhrlueoknkqscfcgdknrrcwhenafvvzrnurzousdxuddwxlmgtcvazicuhhqpgyommceoumckjwpaernntgmhfdmglfzhkbexcudjwlrlgcspxzfqcqfevnynxbflfajelcwffycuooaegjvpfrfdezppucqdzibrnfkdsmnckwnuryzzuupfjfsbkmotggjddxztilwfyhmvjwvslosfiofqpvpmxhxnrmutqznhssusueourmzhyiyhbpsmhpgcgtsjrklsjwdfzexrajvmtzrkgduywhumarkfbhrlksvoazelnxrshwugkokvabvkpwqgjourlkojfblpmmuxnwwqolyiybulbpxgqdwqnanednwzeaqmmlryjypecbtwwzfgffnqrwywxdltjnzamgngpfglqukehfdhavhlacrvdbzalmceaipzgyxsdqwkvnfsrkgxoeasmidghrfwnewivgyzulgaftodfuxuubxbeeyjdgdrwuwnutflotxxhwmhwihrilgxcakuvzirlvdugxrcvdmpkcvshnfpqaunsdmwescckmozdsdtyswzctfutobkhkpnrjhoslxubweghlfjpgktpblasfnsidqmjdkgpzfjppbwzvrjudptfjqsxgsxurxgxqraunmfertruiwnowwcnparespordbzcoavocecygnkmuoyqolsvvmhxqozcwmgfnrmaijpsoikhhzfqz +ycdfawqbganngifiqginmknsonjizzeppjfgpdzcsfncqyzegepxczrgmoraptlwjorcgcnotcsttahcejmfepyqpmxsyqbbtdwgjbhmmvpdhvjuqtvmtlcalhqzyycceqkmjymgjxgswljpszsahcnkouzryjhooznaiwyncjurhfmtbjkuwchxrbhcxynfclaziwrabdhugsvxxpbxhbgkguexkwvngkfwqbwzanwseyhxbeejprrfgwptovfrdzksexscvdchknpsohugtgtiaqqkhqzpbcgcpsbxqyrrojvvdfjuojwqqubeuqkietusfpqahpetfrzpgkndvvmlgajmmagvuqgujhjauzhxswaegimzbtvlzsztuwcrltsgwqzghfeamdvithfrvcntuvxknvkemnfreeubdmbtexwxjrmlyircuhihmocyyadxauwfhpcvmtndqfgdphxnreriacmlqqopdvitwnpybokpbuprvmyelrxhfvathqjcomysbcftmvbasdkkfjiimxyvcbwwdjbsqfamegtyvltrvdqgwekvqejceugavjgdztmdcjaughyxshvgmtlmynqberigkbayywchusuztneuxcthctgabgdmagdlokzsyyiuflllnajfqfgjleuenasbgnygfaebplxgmniqgucammqqeudxuamriowtjjjbppbebfubtzgskjxawkuzvnwborjtchuadrwwoqiyua \ No newline at end of file diff --git a/lib/c/benchmark.c b/lib/c/benchmark.c new file mode 100644 index 00000000..d03f48d0 --- /dev/null +++ b/lib/c/benchmark.c @@ -0,0 +1,119 @@ +/** + * @file + * @brief This file uses Google style formatting. + */ + +#include "benchmark.h" + +#include +#include +#include +#include + +#define INITIAL_CAPACITY 1000 + +typedef struct { + int64_t total_elapsed; + int64_t elapsed; +} timed_result_t; + +static int64_t get_time_ns() { + struct timespec ts; + clock_gettime(CLOCK_MONOTONIC, &ts); + return (int64_t)ts.tv_sec * 1000000000 + ts.tv_nsec; +} + +static double calculate_mean(timed_result_t* results, int count) { + double sum = 0.0; + for (int i = 0; i < count; i++) { + sum += results[i].elapsed / 1000000.0; + } + return sum / count; +} + +static double calculate_std_dev(timed_result_t* results, int count, + double mean) { + double sum_squares = 0.0; + for (int i = 0; i < count; i++) { + double diff = (results[i].elapsed / 1000000.0) - mean; + sum_squares += diff * diff; + } + return sqrt(sum_squares / count); +} + +// run_ms is: +// 0 for "don't run" +// 1 for the check-output run +benchmark_stats_t benchmark_run(benchmark_fn fn, void* data, int run_ms) { + if (run_ms == 0) return (benchmark_stats_t){}; + + int64_t run_ns = (int64_t)run_ms * 1000000; + int64_t total_elapsed = 0; + + timed_result_t* results = malloc(INITIAL_CAPACITY * sizeof(timed_result_t)); + int capacity = INITIAL_CAPACITY; + int count = 0; + + benchmark_result_t last_result; + + if (run_ms > 1) { // Start with a status dot, but not if this is a check-output run + fprintf(stderr, "."); + fflush(stderr); + } + int64_t last_status_t = get_time_ns(); + + while (total_elapsed < run_ns) { + int64_t t0 = get_time_ns(); + last_result = fn(data); + int64_t t1 = get_time_ns(); + // Don't print status dots if it is a check-output run + if (run_ms > 1 && t1 - last_status_t > 1000000000) { + last_status_t = t1; + fprintf(stderr, "."); + fflush(stderr); + } + int64_t elapsed = t1 - t0; + total_elapsed += elapsed; + + if (count == capacity) { + capacity *= 2; + results = realloc(results, capacity * sizeof(timed_result_t)); + } + + results[count].total_elapsed = total_elapsed; + results[count].elapsed = elapsed; + count++; + } + + // If this is a check-output run we haven't printed any status dots, + // so no newline should be printed either + if (run_ms > 1) fprintf(stderr, "\n"); + + double mean = calculate_mean(results, count); + double std_dev = calculate_std_dev(results, count, mean); + + double min_ms = results[0].elapsed / 1000000.0; + double max_ms = min_ms; + + for (int i = 1; i < count; i++) { + double ms = results[i].elapsed / 1000000.0; + if (ms < min_ms) min_ms = ms; + if (ms > max_ms) max_ms = ms; + } + + free(results); + + return (benchmark_stats_t){.mean_ms = mean, + .std_dev_ms = std_dev, + .min_ms = min_ms, + .max_ms = max_ms, + .runs = count, + .last_result = last_result}; +} + +void benchmark_format_results(benchmark_stats_t stats, char* buffer, + size_t size) { + snprintf(buffer, size, "%.6f,%.6f,%.6f,%.6f,%d,%ld", stats.mean_ms, + stats.std_dev_ms, stats.min_ms, stats.max_ms, stats.runs, + stats.last_result.value.number); +} diff --git a/lib/c/benchmark.h b/lib/c/benchmark.h new file mode 100644 index 00000000..c7072fb4 --- /dev/null +++ b/lib/c/benchmark.h @@ -0,0 +1,40 @@ +/** + * @file + * @brief This file uses Google style formatting. + */ + +#ifndef BENCHMARK_H +#define BENCHMARK_H + +#include +#include + +typedef union { + int64_t number; + void* ptr; +} benchmark_value_t; + +typedef struct { + benchmark_value_t value; +} benchmark_result_t; + +typedef struct { + double mean_ms; + double std_dev_ms; + double min_ms; + double max_ms; + int runs; + benchmark_result_t last_result; +} benchmark_stats_t; + +// Function pointer type for the work function +typedef benchmark_result_t (*benchmark_fn)(void* data); + +// Format benchmark results into provided buffer +void benchmark_format_results(benchmark_stats_t stats, char* buffer, + size_t size); + +// Main benchmarking function +benchmark_stats_t benchmark_run(benchmark_fn fn, void* data, int run_ms); + +#endif // BENCHMARK_H diff --git a/lib/clojure/deps.edn b/lib/clojure/deps.edn new file mode 100644 index 00000000..9e26dfee --- /dev/null +++ b/lib/clojure/deps.edn @@ -0,0 +1 @@ +{} \ No newline at end of file diff --git a/lib/clojure/src/languages/benchmark.clj b/lib/clojure/src/languages/benchmark.clj new file mode 100644 index 00000000..9e1fa151 --- /dev/null +++ b/lib/clojure/src/languages/benchmark.clj @@ -0,0 +1,68 @@ +(ns languages.benchmark) + +(defn- stats + "Returns stats in ms for input in ns" + [total-elapsed-time elapsed-times] + (let [runs (count elapsed-times) + mean-ns (/ total-elapsed-time runs) + min-ns (reduce min elapsed-times) + max-ns (reduce max elapsed-times) + variance (/ (reduce + (map (fn [t] + (Math/pow (- t mean-ns) 2)) + elapsed-times)) + runs) + std-dev-ns (Math/sqrt variance)] + {:mean-ms (/ mean-ns 1000000) + :min-ms (/ min-ns 1000000) + :max-ms (/ max-ns 1000000) + :std-dev-ms (/ std-dev-ns 1000000)})) + +; Avoid introducing more overhead than necessary in the loop below +(set! *unchecked-math* :warn-on-boxed) + +(defn run + "Runs `f` repeatedly measuring the time delta in nanoseconds + Stops when the sum of the deltas is larger then `run-ms` + Returns a map with stats and result. + Special cases: When `run-ms` is: 0 => Don't run, `1` => this is a check-output correctness test + NB: If `f` takes sub-milliseconds to run, this function can run for very long + because of the overhead of looping so many times." + [f ^long run-ms] + (when-not (zero? run-ms) + (let [run-ns (* 1000000 run-ms) + runs (binding [*out* *err*] + ;; Start with printing a status dot, except if check-output run + (when (> run-ms 1) (print ".") (flush)) + (loop [results [] + last-tet 0 + last-status-t (System/nanoTime)] + (let [t0 (System/nanoTime) + result (f) + t1 (System/nanoTime) + elapsed-time (- t1 t0) + total-elapsed-time (+ last-tet elapsed-time) + timed-result [total-elapsed-time elapsed-time result] + print-status? (and (> run-ms 1) ; Not if check-output run + (> (- t1 last-status-t) 1000000000))] + (when print-status? (print ".") (flush)) + (if (< total-elapsed-time run-ns) + (recur (conj results timed-result) total-elapsed-time (if print-status? + t1 + last-status-t)) + (do + (when (> run-ms 1) (println)) ; No status printed for check-output runs + (conj results timed-result)))))) + [^long total-elapsed-time _ ^long result] (last runs) + elapsed-times (map second runs)] + (merge {:runs (count runs) + :result result} + (stats total-elapsed-time elapsed-times))))) + +(defn format-results [{:keys [mean-ms std-dev-ms min-ms max-ms runs result]}] + (str (double mean-ms) "," (double std-dev-ms) "," (double min-ms) "," (double max-ms) "," runs "," result)) + +(comment + (-> (run #(reduce + (range 1000000)) 1000) + format-results) + :rcf) + diff --git a/lib/java/languages/Benchmark.java b/lib/java/languages/Benchmark.java new file mode 100644 index 00000000..c9ac1530 --- /dev/null +++ b/lib/java/languages/Benchmark.java @@ -0,0 +1,124 @@ +package languages; + +import java.util.ArrayList; +import java.util.List; +import java.util.function.Supplier; + +public class Benchmark { + public record BenchmarkResult( + double meanMs, + double stdDevMs, + double minMs, + double maxMs, + int runs, + T result) { + public BenchmarkResult withResult(R newResult) { + return new BenchmarkResult<>(meanMs, stdDevMs, minMs, maxMs, runs, newResult); + } + } + + private record TimedResult(long totalElapsedTime, long elapsedTime, T result) { + } + + /* Calculates statistics in ms for input in ns */ + private static class Stats { + final double meanMs; + final double stdDevMs; + final double minMs; + final double maxMs; + final int runs; + + Stats(long totalElapsedTimeNs, List elapsedTimesNs) { + this.runs = elapsedTimesNs.size(); + this.meanMs = elapsedTimesNs.stream() + .mapToDouble(t -> t / 1_000_000.0) + .average() + .orElse(0.0); + + double variance = elapsedTimesNs.stream() + .mapToDouble(t -> t / 1_000_000.0) + .map(t -> t - meanMs) + .map(d -> d * d) + .average() + .orElse(0.0); + + this.stdDevMs = Math.sqrt(variance); + this.minMs = elapsedTimesNs.stream() + .mapToDouble(t -> t / 1_000_000.0) + .min() + .orElse(0.0); + this.maxMs = elapsedTimesNs.stream() + .mapToDouble(t -> t / 1_000_000.0) + .max() + .orElse(0.0); + } + } + + /** + * Runs `f` repeatedly measuring the time delta in nanoseconds. + * Stops when the sum of the deltas is larger than `runMs`. + * Returns a record with stats and result. + * runMs: 0 => don't run, 1 => this is a check-output run + * NB: If `f` takes sub-milliseconds to run, this function can run for very long + * because of the overhead of looping so many times. + */ + public static BenchmarkResult run(Supplier f, long runMs) { + if (runMs == 0) { + return null; + } + + long runNs = runMs * 1_000_000; + List> results = new ArrayList<>(); + long totalElapsedTime = 0; + long lastStatusT = System.nanoTime(); + + if (runMs > 1) { // Start with printing a status dot, except if check-output run + System.err.print("."); + System.err.flush(); + } + while (totalElapsedTime < runNs) { + long t0 = System.nanoTime(); + T result = f.get(); + long t1 = System.nanoTime(); + // Only print status dot if not check-output run + if (runMs > 1 && t1 - lastStatusT > 1_000_000_000) { + lastStatusT = t1; + System.err.print("."); + System.err.flush(); + } + long elapsedTime = t1 - t0; + totalElapsedTime += elapsedTime; + results.add(new TimedResult<>(totalElapsedTime, elapsedTime, result)); + } + if (runMs > 1) { // No status printed for check-output runs + System.err.println(); + } + + TimedResult lastResult = results.get(results.size() - 1); + List elapsedTimes = results.stream() + .map(r -> r.elapsedTime) + .toList(); + + Stats stats = new Stats(lastResult.totalElapsedTime, elapsedTimes); + return new BenchmarkResult<>( + stats.meanMs, + stats.stdDevMs, + stats.minMs, + stats.maxMs, + stats.runs, + lastResult.result); + } + + /** + * Formats the benchmark results into a comma-separated string. + */ + public static String formatResults(BenchmarkResult result) { + return String.format("%.6f,%.6f,%.6f,%.6f,%d,%s", + result.meanMs, + result.stdDevMs, + result.minMs, + result.maxMs, + result.runs, + result.result); + } +} diff --git a/loops/README.md b/loops/README.md index 54e8be38..5361aed4 100644 --- a/loops/README.md +++ b/loops/README.md @@ -1,28 +1,14 @@ # Loops -A simple, not-super-useful program that does a total of 1 billion loop iterations, with some addition and mod operations for each. -The idea with this is to emphasize loop, conditional, and basic math performance. - -Below is the reference C program. -All languages must do the same array work and computations outlined here. - -```C -#include "stdio.h" -#include "stdlib.h" -#include "stdint.h" -#include "time.h" - -int main (int argc, char** argv) { // EVERY PROGRAM IN THIS BENCHMARK MUST... - int u = atoi(argv[1]); // Get an single input number from the command line - srand(time(NULL)); - int r = rand() % 10000; // Get a single random integer 0 <= r < 10k - int32_t a[10000] = {0}; // Create an array of 10k elements initialized to 0 - for (int i = 0; i < 10000; i++) { // 10k outer loop iterations with an iteration variable - for (int j = 0; j < 100000; j++) { // 100k inner loop iterations, per outer loop iteration, with iteration variable - a[i] = a[i] + j%u; // For all 1B iterations, must access array element, compute j%u, update array location - } - a[i] += r; // For all 10k outer iterations, add the random value to each element in array - } - printf("%d\n", a[r]); // Print out a single element from the array -} -``` +This program should benchmark a function that makes 100 milion updates to an array, +with some super simple math in each iteration. The code is designed to try force +the compiler to generate code that actually makes 100 million updates. The code +should be free of any hints to the compiler that the loops actually can be fully +unrolled. If some compiler or interpreter still finds ways to unroll the loops, +then that will be considered a valid result. + +The code should follow the reference implementations as closely as possible: + +* Clojure: [run.clj](clojure/run.clj) +* Java: [run.java](jvm/run.java) +* C: [run.c](c/run.c) diff --git a/loops/bb/bb.edn b/loops/bb/bb.edn new file mode 100644 index 00000000..93473d35 --- /dev/null +++ b/loops/bb/bb.edn @@ -0,0 +1 @@ +{:deps {languages/tooling {:local/root "../../lib/clojure"}}} \ No newline at end of file diff --git a/loops/bb/run.clj b/loops/bb/run.clj new file mode 100644 index 00000000..3f8adb8d --- /dev/null +++ b/loops/bb/run.clj @@ -0,0 +1,21 @@ +(require '[languages.benchmark :as benchmark]) + +(defn main [u] + (let [r (rand-int 10000) ; Get a random number 0 <= r < 10k + v' (vec (repeat 10000 0)) ; Vector of 10k elements initialized to 0 + v (mapv (fn [initial-value] + (let [inner-sum (reduce (fn [sum j] + (+ sum (rem j u))) ; Simple sum + initial-value + (range 10000))] ; 10k inner loop iterations, per outer loop iteration + (+ inner-sum r))) ; Add a random value to each element in array + v')] ; 10k outer loop iterations + (nth v r))) ; Print out a single element from the array + + +(let [run-ms (parse-long (first *command-line-args*)) + ; skip warmup arg, because we skip warmups + u (parse-long (nth *command-line-args* 2))] + (-> (benchmark/run #(main u) run-ms) + benchmark/format-results + println)) diff --git a/loops/c/run.c b/loops/c/run.c new file mode 100644 index 00000000..3bcaafe3 --- /dev/null +++ b/loops/c/run.c @@ -0,0 +1,43 @@ +/** + * @file + * @brief This file uses Google style formatting. + */ + +#include "benchmark.h" +#include "stdint.h" +#include "stdio.h" +#include "stdlib.h" +#include "time.h" + +int loops(int u) { + srand(time(NULL)); // FIX random seed + int r = rand() % 10000; // Get a random integer 0 <= r < 10k + int32_t a[10000] = {0}; // Array of 10k elements initialized to 0 + for (int i = 0; i < 10000; i++) { // 10k outer loop iterations + for (int j = 0; j < 10000; + j++) { // 10k inner loop iterations, per outer loop iteration + a[i] = a[i] + j % u; // Simple sum + } + a[i] += r; // Add a random value to each element in array + } + return a[r]; +} + +// The work function that benchmark will time +static benchmark_result_t work(void* data) { + int* u = (int*)data; + int r = loops(*u); + benchmark_result_t result = {.value.number = r}; + return result; +} + +int main(int argc, char** argv) { + int run_ms = atoi(argv[1]); + int warmup_ms = atoi(argv[2]); + int u = atoi(argv[3]); + benchmark_run(work, &u, warmup_ms); + benchmark_stats_t stats = benchmark_run(work, &u, run_ms); + char buffer[1024]; + benchmark_format_results(stats, buffer, sizeof(buffer)); + printf("%s\n", buffer); +} diff --git a/loops/check-output.sh b/loops/check-output.sh new file mode 100755 index 00000000..78c26d95 --- /dev/null +++ b/loops/check-output.sh @@ -0,0 +1,30 @@ +#!/bin/bash + +expected_min=195000 +expected_max=204999 + +input=40 +echo_input=false + +while getopts "i" opt; do + case $opt in + i) echo_input=true ;; + *) ;; + esac +done + +if [ "$echo_input" = true ]; then + echo "$input" + exit 0 +fi + +result=$(echo "$1" | sed 's/\x1b\[[0-9;]*m//g' | awk -F ',' '{print $6}') + +if [ "$result" -ge "$expected_min" ] && [ "$result" -le "$expected_max" ]; then + echo "Check passed" + exit 0 +else + echo "Incorrect output: Out of range" + echo "$result" + exit 1 +fi \ No newline at end of file diff --git a/loops/clojure-native-image/deps.edn b/loops/clojure-native-image/deps.edn index ba3b7c62..b9df496c 100644 --- a/loops/clojure-native-image/deps.edn +++ b/loops/clojure-native-image/deps.edn @@ -1,13 +1,19 @@ {:paths ["."] - :deps {code/clojure {:local/root "../clojure"}} + :deps {code/clojure {:local/root "../clojure"} + clj.native-image/clj.native-image + {:git/url "https://github.com/taylorwood/clj.native-image.git" + :sha "4604ae76855e09cdabc0a2ecc5a7de2cc5b775d6"}} :aliases {:native-image {:main-opts ["-m" "clj.native-image" "code" "-O3" "--initialize-at-build-time" "-H:+UnlockExperimentalVMOptions" "-H:Name=code"] - :jvm-opts ["-Dclojure.compiler.direct-linking=true"] - :extra-deps - {clj.native-image/clj.native-image - {:git/url "https://github.com/taylorwood/clj.native-image.git" - :sha "4604ae76855e09cdabc0a2ecc5a7de2cc5b775d6"}}}}} \ No newline at end of file + :jvm-opts ["-Dclojure.compiler.direct-linking=true"]} + :native-image-run + {:main-opts ["-m" "clj.native-image" "run" + "-O3" + "--initialize-at-build-time" + "-H:+UnlockExperimentalVMOptions" + "-H:Name=run"] + :jvm-opts ["-Dclojure.compiler.direct-linking=true"]}}} \ No newline at end of file diff --git a/loops/clojure/deps.edn b/loops/clojure/deps.edn index b0f81dbf..6458af77 100644 --- a/loops/clojure/deps.edn +++ b/loops/clojure/deps.edn @@ -1 +1,2 @@ -{:paths ["."]} \ No newline at end of file +{:paths ["."] + :deps {languages/tooling {:local/root "../../lib/clojure"}}} \ No newline at end of file diff --git a/loops/clojure/run.clj b/loops/clojure/run.clj new file mode 100644 index 00000000..567be61f --- /dev/null +++ b/loops/clojure/run.clj @@ -0,0 +1,31 @@ +(ns run + (:require [languages.benchmark :as benchmark]) + (:gen-class)) + +(set! *unchecked-math* :warn-on-boxed) + +(defn- loops [^long u] + (let [r (long (rand-int 10000)) ; Get a random number 0 <= r < 10k + a (long-array 10000)] ; Array of 10k elements initialized to 0 + (loop [i 0] + (when (< i 10000) ; 10k outer loop iterations + (loop [j 0] + (when (< j 10000) ; 10k inner loop iterations, per outer loop iteration + (aset a i (unchecked-add (aget a i) (rem j u))) ; Simple sum + (recur (unchecked-inc j)))) + (aset a i (unchecked-add (aget a i) r)) ; Add a random value to each element in array + (recur (unchecked-inc i)))) + (aget a r))) + +(defn -main [& args] + (let [run-ms (parse-long (first args)) + warmup-ms (parse-long (second args)) + u (parse-long (nth args 2)) + _warmup (benchmark/run #(loops u) warmup-ms)] + (-> (benchmark/run #(loops u) run-ms) + benchmark/format-results + println))) + +(comment + (-main "1" "0" "40") + :rcf) diff --git a/loops/jvm/run.java b/loops/jvm/run.java new file mode 100644 index 00000000..b82fc75e --- /dev/null +++ b/loops/jvm/run.java @@ -0,0 +1,28 @@ +package jvm; + +import java.util.Random; +import languages.Benchmark; + +public class run { + + private static int loops(int u) { + var r = new Random().nextInt(10000); // Get a random number 0 <= r < 10k + var a = new int[10000]; // Array of 10k elements initialized to 0 + for (var i = 0; i < 10000; i++) { // 10k outer loop iterations + for (var j = 0; j < 10000; j++) { // 10k inner loop iterations, per outer loop iteration + a[i] = a[i] + j % u; // Simple sum + } + a[i] += r; // Add a random value to each element in array + } + return a[r]; + } + + public static void main(String[] args) { + var runMs = Integer.parseInt(args[0]); + var warmupMS = Integer.parseInt(args[1]); + var n = Integer.parseInt(args[2]); + Benchmark.run(() -> loops(n), warmupMS); + var results = Benchmark.run(() -> loops(n), runMs); + System.out.println(Benchmark.formatResults(results)); + } +} diff --git a/run-legacy.sh b/run-legacy.sh new file mode 100755 index 00000000..196d59d9 --- /dev/null +++ b/run-legacy.sh @@ -0,0 +1,105 @@ +# We run the benchmark with input.txt as arguments +# unless the script is run with arguments, then those will be used instead +# With arguments the check will be skipped, unless the only argument is "check" +# The special argument "check" makes the input always input.txt, and skips the benchmark + +num_script_args="${#}" +script_args="${*}" +if [ "${script_args}" = "check" ]; then + input=$(cat input.txt) +else + input=${script_args:-$(cat input.txt)} +fi + +function check { + if [ ${num_script_args} -eq 0 ] || [ "${script_args}" = "check" ]; then + echo "Checking $1" + output=$(${2} ${3}) + if ! ./check.sh "$output"; then + echo "Check failed for $1." + return 1 + fi + fi +} + +function run { + echo "" + if [ -f ${2} ]; then + check "${1}" "${3}" "${4}" + if [ ${?} -eq 0 ] && [ "${script_args}" != "check" ]; then + cmd=$(echo "${3} ${4}" | awk '{ if (length($0) > 80) print substr($0, 1, 60) " ..."; else print $0 }') + echo "Benchmarking $1" + hyperfine -i --shell=none --output=pipe --runs 3 --warmup 2 -n "${cmd}" "${3} ${4}" + fi + else + echo "No executable or script found for $1. Skipping." + fi +} + +run "Hare" "./hare/code 40" +# run "Language" "Executable" "Command" "Arguments" +#run "Ada" "./ada/code" "./ada/code" "${input}" +#run "AWK" "./awk/code.awk" "awk -f ./awk/code.awk" "${input}" +#run "Babashka" "bb/code.clj" "bb bb/code.clj" "${input}" +run "Bun (Compiled)" "./js/bun" "./js/bun" "${input}" +run "Bun (jitless)" "./js/code.js" "bun ./js/code.js" "BUN_JSC_useJIT=0" "${input}" +run "Bun" "./js/code.js" "bun ./js/code.js" "${input}" +run "C3" "./c3/code" "./c3/code" "${input}" +run "C" "./c/code" "./c/code" "${input}" +run "C#" "./csharp/code/code" "./csharp/code/code" "${input}" +run "C# AOT" "./csharp/code-aot/code" "./csharp/code-aot/code" "${input}" +run "Chez Scheme" "./chez/code.so" "chez --program ./chez/code.so" "${input}" +run "Clojure" "./clojure/classes/code.class" "java -cp clojure/classes:$(clojure -Spath) code" "${input}" +run "Clojure Native" "./clojure-native-image/code" "./clojure-native-image/code" "${input}" +run "COBOL" "./cobol/main" "./cobol/main" "${input}" +run "Common Lisp" "./common-lisp/code" "sbcl --script common-lisp/code.lisp" "${input}" +run "CPP" "./cpp/code" "./cpp/code" "${input}" +run "Crystal" "./crystal/code" "./crystal/code" "${input}" +#run "D" "./d/code" "./d/code" "${input}" +run "Dart" "./dart/code" "./dart/code" "${input}" +run "Deno (jitless)" "./js/code.js" "deno --v8-flags=--jitless ./js/code.js" "${input}" +run "Deno" "./js/code.js" "deno run ./js/code.js" "${input}" +run "Elixir" "./elixir/bench.exs" "elixir ./elixir/bench.exs" "${input}" +run "Emojicode" "./emojicode/code" "./emojicode/code" "${input}" +run "F#" "./fsharp/code/code" "./fsharp/code/code" "${input}" +run "F# AOT" "./fsharp/code-aot/code" "./fsharp/code-aot/code" "${input}" +run "Fortran" "./fortran/code" "./fortran/code" "${input}" +run "Free Pascal" "./fpc/code" "./fpc/code" "${input}" +run "Go" "./go/code" "./go/code" "${input}" +run "Haskell" "./haskell/code" "./haskell/code" "${input}" +#run "Haxe JVM" "haxe/code.jar" "java -jar haxe/code.jar" "${input}" # was getting errors running `haxelib install hxjava` +run "Inko" "./inko/code" "./inko/code" "${input}" +run "Java" "./jvm/code.class" "java jvm.code" "${input}" +run "Java Native" "./java-native-image/code" "./java-native-image/code" "${input}" +run "Julia" "./julia/code.jl" "julia ./julia/code.jl" "${input}" +run "Kotlin JVM" "kotlin/code.jar" "java -jar kotlin/code.jar" "${input}" +run "Kotlin Native" "./kotlin/code.kexe" "./kotlin/code.kexe" "${input}" +run "Lua" "./lua/code.lua" "lua ./lua/code.lua" "${input}" +run "LuaJIT" "./lua/code" "luajit ./lua/code" "${input}" +#run "MAWK" "./awk/code.awk" "mawk -f ./awk/code.awk" "${input}" +run "Modula 2" "./modula2/code" "./modula2/code" "${input}" +run "Nim" "./nim/code" "./nim/code" "${input}" +run "Node (jitless)" "./js/code.js" "node --jitles ./js/code.js" "${input}" +run "Node" "./js/code.js" "node ./js/code.js" "${input}" +run "Objective-C" "./objc/code" "./objc/code" "${input}" +#run "Octave" "./octave/code.m" "octave ./octave/code.m 40" "${input}" +run "Odin" "./odin/code" "./odin/code" "${input}" +run "PHP JIT" "./php/code.php" "php -dopcache.enable_cli=1 -dopcache.jit=on -dopcache.jit_buffer_size=64M ./php/code.php" "${input}" +run "PHP" "./php/code.php" "php ./php/code.php" "${input}" +run "PyPy" "./py/code.py" "pypy ./py/code.py" "${input}" +run "Python" "./py/code.py" "python3.12 ./py/code.py" "${input}" +run "Python JIT" "./py-jit/code.py" "python3.12 ./py-jit/code.py" "${input}" +#run "R" "./r/code.R" "Rscript ./r/code.R" "${input}" +run "Racket" "./racket/code" "./racket/code" "$input" +run "Ruby YJIT" "./ruby/code.rb" "miniruby --yjit ./ruby/code.rb" "${input}" +run "Ruby" "./ruby/code.rb" "ruby ./ruby/code.rb" "${input}" +run "Rust" "./rust/target/release/code" "./rust/target/release/code" "${input}" +run "Scala" "./scala/code" "./scala/code" "${input}" +run "Scala-Native" "./scala/code-native" "./scala/code-native" "${input}" +run "Bun Scala-JS(Compiled)" "./scala/bun" "./scala/bun" "${input}" +run "Bun Scala-JS" "./scala/code.js" "bun ./scala/code.js" "${input}" +run "Swift" "./swift/code" "./swift/code" "${input}" +run "V" "./v/code" "./v/code" "${input}" +run "Zig" "./zig/code" "./zig/code" "${input}" +run "Emacs Lisp Bytecode" "./emacs-lisp/code.elc" "emacs -Q --batch --load ./emacs-lisp/code.elc" "${input}" +run "Emacs Lisp Native" "./emacs-lisp/code.eln" "emacs -Q --batch --load ./emacs-lisp/code.eln" "${input}" diff --git a/run.sh b/run.sh index 196d59d9..068dc178 100755 --- a/run.sh +++ b/run.sh @@ -1,105 +1,163 @@ -# We run the benchmark with input.txt as arguments -# unless the script is run with arguments, then those will be used instead -# With arguments the check will be skipped, unless the only argument is "check" -# The special argument "check" makes the input always input.txt, and skips the benchmark - -num_script_args="${#}" -script_args="${*}" -if [ "${script_args}" = "check" ]; then - input=$(cat input.txt) +#!/bin/bash + +benchmark=$(basename "${PWD}") + +# Defaults +check_only=false +skip_check=false +run_ms=10000 +warmup_ms=2000 +cmd_input="$(./check-output.sh -i)" +user="JDoe" +only_langs=false +use_hyperfine=false +[[ "$benchmark" == "hello-world" ]] && use_hyperfine=true + +while getopts "cst:w:u:l:h" opt; do + case $opt in + u) user="${OPTARG}" ;; # Included in result file + t) run_ms="${OPTARG}" ;; # How long should the benchmark run? + w) warmup_ms="${OPTARG}" ;; # Warmup length + c) check_only=true ;; # Skip benchmark + s) skip_check=true ;; # Run benchmark even if check fails (typically with non-default input) + l) only_langs="${OPTARG}" ;; # Languages to benchmark, comma separated + *) ;; + esac +done +shift $((OPTIND-1)) + +only_langs_slug="" +if [ -n "${only_langs}" ] && [ "${only_langs}" != "false" ]; then + IFS=',' read -r -a only_langs <<< "${only_langs}" + only_langs_slug="_only_langs" +fi + +is_checked=true +if [ "$skip_check" = true ]; then + is_checked=false +fi +user=${user//,/_} +input_value="${1}" +if [ -n "${input_value}" ]; then + cmd_input="${input_value}" +fi + +commit_sha=$(git rev-parse --short HEAD) +timestamp=$(date -u +"%Y-%m-%dT%H:%M:%SZ") +benchmark_dir="/tmp/languages-benchmark" +os=${OSTYPE//,/_} +arch=$(uname -m) + +if [[ "${os}" == "darwin"* || "${os}" == "freebsd"* ]]; then + model=$(sysctl -n machdep.cpu.brand_string) +elif [[ "${os}" == "linux-gnu"* ]]; then + model=$(lscpu | grep "Model name" | awk -F: '{print $2}' | sed -e 's/^[[:space:]]*//') +else + model="Unknown" +fi +model=${model//,/_} + +if [[ "${os}" == "darwin"* || "${os}" == "freebsd"* ]]; then + ram=$(sysctl -n hw.memsize) + ram=$((ram / 1024 / 1024 / 1024))GB +elif [[ "${os}" == "linux-gnu"* ]]; then + ram=$(grep MemTotal /proc/meminfo | awk '{print $2}') + ram=$((ram / 1024 / 1024))GB else - input=${script_args:-$(cat input.txt)} + ram="Unknown" fi +mkdir -p "${benchmark_dir}" +results_file_name="${benchmark}_${user}_${run_ms}_${commit_sha}${only_langs_slug}.csv" +results_file="${benchmark_dir}/${results_file_name}" +# Data header, must match what is printed from `run` +if [ "${check_only}" = false ]; then + echo "benchmark,timestamp,commit_sha,is_checked,user,model,ram,os,arch,language,run_ms,mean_ms,std-dev-ms,min_ms,max_ms,runs" > "${results_file}" + echo "Running ${benchmark} benchmark..." + echo "Results will be written to: ${results_file}" +else + echo "Only checking ${benchmark} benchmark" + echo "No benchmark will be run" +fi + + function check { - if [ ${num_script_args} -eq 0 ] || [ "${script_args}" = "check" ]; then - echo "Checking $1" - output=$(${2} ${3}) - if ! ./check.sh "$output"; then - echo "Check failed for $1." + local language_name=${1} + local partial_command=${2} + local input_arg=${3} + + local command_line + local program_output + + if [ ${skip_check} = false ]; then + echo "Checking ${benchmark} ${language_name}" + command_line="${partial_command} 1 0 ${input_arg}" + program_output=$(${command_line}) + if ! ./check-output.sh "${program_output}"; then + echo "Check failed for ${benchmark} ${language_name}." return 1 fi + else + echo "Skipping check for ${benchmark} ${language_name}" fi } function run { - echo "" - if [ -f ${2} ]; then - check "${1}" "${3}" "${4}" - if [ ${?} -eq 0 ] && [ "${script_args}" != "check" ]; then - cmd=$(echo "${3} ${4}" | awk '{ if (length($0) > 80) print substr($0, 1, 60) " ..."; else print $0 }') - echo "Benchmarking $1" - hyperfine -i --shell=none --output=pipe --runs 3 --warmup 2 -n "${cmd}" "${3} ${4}" + # "Language" "File that should exist" "Partial command line" + local language_name=${1} + local file_that_should_exist=${2} + local partial_command=${3} + + + if [ "$only_langs" != false ]; then + local should_run=false + for lang in "${only_langs[@]}"; do + if [ "$lang" = "$language_name" ]; then + should_run=true + break + fi + done + if [ "$should_run" = false ]; then + return + fi + fi + + local result + echo + if [ -f "${file_that_should_exist}" ]; then + check "${language_name}" "${partial_command}" "${cmd_input}" + if [ ${?} -eq 0 ] && [ ${check_only} = false ]; then + echo "Benchmarking ${benchmark} ${language_name}" + if [ ${use_hyperfine} = true ]; then + local command_line="${partial_command} 1 0 ${cmd_input}" + mkdir -p "${benchmark_dir}/hyperfine" + hyperfine_file="${benchmark_dir}/hyperfine/${results_file_name}" + hyperfine -i --shell=none --output=pipe --runs 25 --warmup 5 --export-csv "${hyperfine_file}" "${command_line}" + result=$(tail -n +2 "${hyperfine_file}" | awk -F ',' '{print ($2*1000)","($3*1000)","($7*1000)","($8*1000)","25}') + else + local command_line="${partial_command} ${run_ms} ${warmup_ms} ${cmd_input}" + echo "${command_line}" + local program_output=$(eval "${command_line}") + result=$(echo "${program_output}" | awk -F ',' '{print $1","$2","$3","$4","$5}') + fi + echo "${benchmark},${timestamp},${commit_sha},${is_checked},${user},${model},${ram},${os},${arch},${language_name},${run_ms},${result}" | tee -a "${results_file}" fi else - echo "No executable or script found for $1. Skipping." + echo "No executable or script found for ${language_name}. Skipping." fi } -run "Hare" "./hare/code 40" -# run "Language" "Executable" "Command" "Arguments" -#run "Ada" "./ada/code" "./ada/code" "${input}" -#run "AWK" "./awk/code.awk" "awk -f ./awk/code.awk" "${input}" -#run "Babashka" "bb/code.clj" "bb bb/code.clj" "${input}" -run "Bun (Compiled)" "./js/bun" "./js/bun" "${input}" -run "Bun (jitless)" "./js/code.js" "bun ./js/code.js" "BUN_JSC_useJIT=0" "${input}" -run "Bun" "./js/code.js" "bun ./js/code.js" "${input}" -run "C3" "./c3/code" "./c3/code" "${input}" -run "C" "./c/code" "./c/code" "${input}" -run "C#" "./csharp/code/code" "./csharp/code/code" "${input}" -run "C# AOT" "./csharp/code-aot/code" "./csharp/code-aot/code" "${input}" -run "Chez Scheme" "./chez/code.so" "chez --program ./chez/code.so" "${input}" -run "Clojure" "./clojure/classes/code.class" "java -cp clojure/classes:$(clojure -Spath) code" "${input}" -run "Clojure Native" "./clojure-native-image/code" "./clojure-native-image/code" "${input}" -run "COBOL" "./cobol/main" "./cobol/main" "${input}" -run "Common Lisp" "./common-lisp/code" "sbcl --script common-lisp/code.lisp" "${input}" -run "CPP" "./cpp/code" "./cpp/code" "${input}" -run "Crystal" "./crystal/code" "./crystal/code" "${input}" -#run "D" "./d/code" "./d/code" "${input}" -run "Dart" "./dart/code" "./dart/code" "${input}" -run "Deno (jitless)" "./js/code.js" "deno --v8-flags=--jitless ./js/code.js" "${input}" -run "Deno" "./js/code.js" "deno run ./js/code.js" "${input}" -run "Elixir" "./elixir/bench.exs" "elixir ./elixir/bench.exs" "${input}" -run "Emojicode" "./emojicode/code" "./emojicode/code" "${input}" -run "F#" "./fsharp/code/code" "./fsharp/code/code" "${input}" -run "F# AOT" "./fsharp/code-aot/code" "./fsharp/code-aot/code" "${input}" -run "Fortran" "./fortran/code" "./fortran/code" "${input}" -run "Free Pascal" "./fpc/code" "./fpc/code" "${input}" -run "Go" "./go/code" "./go/code" "${input}" -run "Haskell" "./haskell/code" "./haskell/code" "${input}" -#run "Haxe JVM" "haxe/code.jar" "java -jar haxe/code.jar" "${input}" # was getting errors running `haxelib install hxjava` -run "Inko" "./inko/code" "./inko/code" "${input}" -run "Java" "./jvm/code.class" "java jvm.code" "${input}" -run "Java Native" "./java-native-image/code" "./java-native-image/code" "${input}" -run "Julia" "./julia/code.jl" "julia ./julia/code.jl" "${input}" -run "Kotlin JVM" "kotlin/code.jar" "java -jar kotlin/code.jar" "${input}" -run "Kotlin Native" "./kotlin/code.kexe" "./kotlin/code.kexe" "${input}" -run "Lua" "./lua/code.lua" "lua ./lua/code.lua" "${input}" -run "LuaJIT" "./lua/code" "luajit ./lua/code" "${input}" -#run "MAWK" "./awk/code.awk" "mawk -f ./awk/code.awk" "${input}" -run "Modula 2" "./modula2/code" "./modula2/code" "${input}" -run "Nim" "./nim/code" "./nim/code" "${input}" -run "Node (jitless)" "./js/code.js" "node --jitles ./js/code.js" "${input}" -run "Node" "./js/code.js" "node ./js/code.js" "${input}" -run "Objective-C" "./objc/code" "./objc/code" "${input}" -#run "Octave" "./octave/code.m" "octave ./octave/code.m 40" "${input}" -run "Odin" "./odin/code" "./odin/code" "${input}" -run "PHP JIT" "./php/code.php" "php -dopcache.enable_cli=1 -dopcache.jit=on -dopcache.jit_buffer_size=64M ./php/code.php" "${input}" -run "PHP" "./php/code.php" "php ./php/code.php" "${input}" -run "PyPy" "./py/code.py" "pypy ./py/code.py" "${input}" -run "Python" "./py/code.py" "python3.12 ./py/code.py" "${input}" -run "Python JIT" "./py-jit/code.py" "python3.12 ./py-jit/code.py" "${input}" -#run "R" "./r/code.R" "Rscript ./r/code.R" "${input}" -run "Racket" "./racket/code" "./racket/code" "$input" -run "Ruby YJIT" "./ruby/code.rb" "miniruby --yjit ./ruby/code.rb" "${input}" -run "Ruby" "./ruby/code.rb" "ruby ./ruby/code.rb" "${input}" -run "Rust" "./rust/target/release/code" "./rust/target/release/code" "${input}" -run "Scala" "./scala/code" "./scala/code" "${input}" -run "Scala-Native" "./scala/code-native" "./scala/code-native" "${input}" -run "Bun Scala-JS(Compiled)" "./scala/bun" "./scala/bun" "${input}" -run "Bun Scala-JS" "./scala/code.js" "bun ./scala/code.js" "${input}" -run "Swift" "./swift/code" "./swift/code" "${input}" -run "V" "./v/code" "./v/code" "${input}" -run "Zig" "./zig/code" "./zig/code" "${input}" -run "Emacs Lisp Bytecode" "./emacs-lisp/code.elc" "emacs -Q --batch --load ./emacs-lisp/code.elc" "${input}" -run "Emacs Lisp Native" "./emacs-lisp/code.eln" "emacs -Q --batch --load ./emacs-lisp/code.eln" "${input}" +# Please keep in language name alphabetic order +# run "Language name" "File that should exist" "Command line" +####### BEGIN The languages +run "Babashka" "bb/run.clj" "bb bb/run.clj" +run "C" "./c/run" "./c/run" +run "Clojure" "./clojure/classes/run.class" "java -cp clojure/classes:$(clojure -Spath) run" +run "Clojure Native" "./clojure-native-image/run" "./clojure-native-image/run" +run "Java" "./jvm/run.class" "java -cp .:../lib/java jvm.run" +run "Java Native" "./java-native-image/run" "./java-native-image/run" +####### END The languages + +echo +echo "Done running $(basename ${PWD}) benchmark" +echo "Results were written to: ${results_file}"