Is your feature request related to a problem? Please describe.
It might be nice to see how quickly a model runs when using a BLAS implementation for GEMM.
Describe the solution you'd like
An option, similar to how AVX and AVX512 can be selected that allows for selection of a BLAS implementation.
Describe alternatives you've considered
OpenBLAS with some kind of option to give it a CPU target might be a good option for this.
Additional context
NaN