We should begin to prepare easier ways for visualizing the attention: https://transformerlensorg.github.io/CircuitsVis/?path=/docs/activations-textneuronactivations--multiple-samples Would be ideal to have these setup for validation checkpoints as well.