Now the benchmarking is done by simply using criterion, however it's not easy to compare performance between multiple commits using criterion.