Improve benchmarking workflow

#### Is your feature request related to a problem? Please describe
We currently benchmark PyVRP on five different instance sets. Each instance set requires a specific build, round function, stopping criteria, etc., as described [here](https://pyvrp.org/dev/benchmarking.html). I currently have a different folder for each instance set. At each new version release, I run the benchmarks from each of these folders which are automated. Once all instances are solved, I move the results to my local environment and run notebook to compute the gaps.

With many more instance sets to come, this benchmarking workflow requires a lot of manual work. It would be nice to have a more automated benchmarking workflow, and to have this publicly available so that anyone can reproduce these steps.

#### Describe the solution you'd like
The benchmarking process looks like this:
1. For each instance set:
	- Build PyVRP using the correct problem type and precision.
	- Solve with the correct stopping criterion and round function.
2. Compute gaps for each instance set.
3. Update the benchmarking results.

I think step 1 is relatively straightforward because it's just a simple Python/shell script. It will include some custom code that depends on the cluster environment that one runs on.

Step 2 is somewhat more cumbersome. I currently have several Jupyter notebooks that computes the gaps for each instance set. Besides requiring manual effort, it's just a bit messy and hard to maintain. Ideally, we keep something like [a spreadsheet](https://docs.google.com/spreadsheets/d/1fj-CeZ53fzFg-Tf004aHXJxOSWh4ILWLpiV_BHzdfw8/edit#gid=257538566). Each instance set is in a separate tab, and each new version release becomes added as a new column. What's also nice is that we can store reference solutions so that the gaps are updated with new BKS. Instead of using an Excel spreadsheet, we can have an automated workflow that updates a set of CSV files with the new benchmark results.

Step 3 can still be done manually by editing the [benchmark page](https://pyvrp.org/setup/benchmarks.html). I'm OK with that.

I will try to work on this for PyVRP/PyVRP#435.

#### Additional context
There are two open issues that will simplify the benchmarking process further:
- PyVRP/PyVRP#491: if we don't keep double precision, we only need to distinguish between CVRP and non-CVRP builds.
	- Related: is it possible to get rid of a CVRP build too?
- https://github.com/PyVRP/VRPLIB/issues/111: if we include rounding functions in the VRPLIB format, then we don't need to pass rounding functions to `cli.py`.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve benchmarking workflow #1

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improve benchmarking workflow #1

Description

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions