add alibi position embedding and support baichuan #54
add alibi position embedding and support baichuan #54qyccc wants to merge 48 commits intoalibaba:mainfrom
Conversation
add alibi position embedding
|
Cool! It may take some time to review 🙃 |
li-yi-dong
left a comment
There was a problem hiding this comment.
I'll review again once you solve the comments.
megatron/model/transformer.py
Outdated
| self.checkpoint_core_attention = args.recompute_granularity == 'selective' | ||
|
|
||
| self.apply_query_key_layer_scaling = args.apply_query_key_layer_scaling | ||
| world_size = mpu.get_tensor_model_parallel_world_size() |
There was a problem hiding this comment.
tensor_parallel_size
sorry, I didn't get it. Do you mean the variance name should be tensor_parallel_size?
|
@li-yi-dong Thanks for your time and cautious review. I have made the necessary changes and addressed the comments you mentioned. Please take another look at the updated version at your convenience. |
li-yi-dong
left a comment
There was a problem hiding this comment.
Big thanks to your efforts and patience.
I added some comments to resolve.
| forward_step_func""" | ||
| self.input_tensor = input_tensor | ||
|
|
||
| def _build_alibi_tensor(self, tensor, max_seq_len, num_attention_heads): |
There was a problem hiding this comment.
Placing this func together with alibi_mask_func
There was a problem hiding this comment.
This func requires the internal variable first_run, so it cannot be placed in the utils.
This adds the ALiBi method and its flash attention version (using triton) for positional information. And it supports baichuan model trainig by porting over the implementation from baichuan-inc/Baichuan2-13B-Base .