If the goal is to compare LoRA-style methods fairly, why not fix one existing open-source MLLM checkpoint that already has a trained projector/connector (i.e., already aligned), and then apply LoRA vs MokA under the same SFT data/hparams/decoding? Why do you re-run the alignment pretraining stage?
Thanks!