Skip to content

Conversation

@ramonwirsch
Copy link
Member

  • renamed confusingly named test source to match its _test binary
  • upgraded tests/Amul to use the refAmul to verify results
  • disable docc on test sources if that is the compiler used
  • set other flags needed

 * renamed confusingly named test source to match its _test binary
 * upgraded tests/Amul to use the refAmul to verify results
 * disable docc on test sources if that is the compiler used
 * set other flags needed
@daisytuner
Copy link

daisytuner bot commented Dec 12, 2025

Daisytuner Report - tt_kernel_performance_benchs (zinnia)

@@                                   Benchmarks                                   @@
=====================================================================================
  Benchmark              Time        ΔTime       Thr         Energy      ΔEnergy     
=====================================================================================
# denseMatrix_Amul       3.88 s      +0.45%      N/A         N/A         N/A         
# denseMatrix_Amul_test  3.34 s      +0.21%      N/A         N/A         N/A         
# ellpackMatrix_Amul_fpu 3.17 s      -1.39%      N/A         N/A         N/A         
# ellpackMatrix_Amul_no_hw3.17 s      -1.40%      N/A         N/A         N/A         

@daisytuner
Copy link

daisytuner bot commented Dec 12, 2025

Daisytuner Report - tt_kernel_performance_benchs (zinnia-ci)

@@                                   Benchmarks                                   @@
=====================================================================================
  Benchmark              Time        ΔTime       Thr         Energy      ΔEnergy     
=====================================================================================
# denseMatrix_Amul       5.68 s      +0.02%      N/A         N/A         N/A         
# denseMatrix_Amul_test  5.18 s      -0.00%      N/A         N/A         N/A         
# ellpackMatrix_Amul_fpu 4.98 s      +0.01%      N/A         N/A         N/A         
# ellpackMatrix_Amul_no_hw4.98 s      -0.00%      N/A         N/A         N/A         

@ramonwirsch
Copy link
Member Author

Docc version with needed cmd options not yet released

 + saving range of cols referenced in each tile to make Tensix cores no longer have to stream ALL of the vector, but only the relevant parts
 ! closed range of vectors across the batched tiles is loaded per core (considering alignment to 1024 vectors and 128 vector pages)
 ~ fixed lack of synchronization in soft-float impl, cb_get_tile semaphore was not guaranteed clean
 ! currently, T0 cannot collect next batch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants