Hi. It is nice to see this great work! Could you share the number/type of GPUs and the runtimes used to run the provided examples (llama3_to_qwen2_tokenizer_gpu.sh etc.) Thank you!