Compare with the `_FusedMatMul`, the `_MklFusedMatMul` shows near double-time use in test. pls investigate the root cause and fix it. 