-
Notifications
You must be signed in to change notification settings - Fork 335
Description
As I mentioned in #336 (comment) I think we can take inspiration from Dave Plummers Eratosthenes drag-race benchmark in order to control when we want start-times to matter or not.
Let's use levenshtein as an example. What we are insterested in is how long takes for the program, given a bunch of strings, to figure out the minimum Levenshtein distance. So then we should only be measuring this. Not the start time of the executable. Not the time it takes to parse the input arguments and prepare for figuring out the minimum distance.
In Dave Plummer's drag-race each program is given the same amount of time, and they measure how many times the program can finish the task. This controls the length of the whole benchmark run much better and we can allow slow programs to participate without worrying that they will take forever. We can make it so that even R finishes one run of Levenshtein in, say, 15s . Then we give each program 30s to run, and allow them to use the same time to warm up their JIT engines. Even with 100 languages, slow or fast, it would take a maximum of 2 hours to run a benchmark. Say we have 5 benchmarks and 100 languages, it would take 10 hours.
I think we could have some easy-to-parse config file that specifies the benchmark. How long benchmark time that is allowed, the name of the input file, maybe something more. Then each program has some infrastructure to read and parse the config file, read and parse the input, run the measured function, measure how many times it runs, measure how long time it took, output the result of the function as well as the measurements.
It would be some considerable work to update the contributions, but since it's crowd sourced work, I think we can do it. I can do it for Java, C, and Clojure for reference and learning before we go ahead updating the rest. (My C is not up-to-date, nor up-to-speed, but I think I can do it, haha.)