Skip to content

Improve benchmarking workflow #1

@leonlan

Description

@leonlan

Is your feature request related to a problem? Please describe

We currently benchmark PyVRP on five different instance sets. Each instance set requires a specific build, round function, stopping criteria, etc., as described here. I currently have a different folder for each instance set. At each new version release, I run the benchmarks from each of these folders which are automated. Once all instances are solved, I move the results to my local environment and run notebook to compute the gaps.

With many more instance sets to come, this benchmarking workflow requires a lot of manual work. It would be nice to have a more automated benchmarking workflow, and to have this publicly available so that anyone can reproduce these steps.

Describe the solution you'd like

The benchmarking process looks like this:

  1. For each instance set:
    • Build PyVRP using the correct problem type and precision.
    • Solve with the correct stopping criterion and round function.
  2. Compute gaps for each instance set.
  3. Update the benchmarking results.

I think step 1 is relatively straightforward because it's just a simple Python/shell script. It will include some custom code that depends on the cluster environment that one runs on.

Step 2 is somewhat more cumbersome. I currently have several Jupyter notebooks that computes the gaps for each instance set. Besides requiring manual effort, it's just a bit messy and hard to maintain. Ideally, we keep something like a spreadsheet. Each instance set is in a separate tab, and each new version release becomes added as a new column. What's also nice is that we can store reference solutions so that the gaps are updated with new BKS. Instead of using an Excel spreadsheet, we can have an automated workflow that updates a set of CSV files with the new benchmark results.

Step 3 can still be done manually by editing the benchmark page. I'm OK with that.

I will try to work on this for PyVRP/PyVRP#435.

Additional context

There are two open issues that will simplify the benchmarking process further:

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions