Skip to content

Conversation

@codernine-moreh
Copy link

Description

  • Mistral 에서도 fused_rms_norm option을 사용할 수 있도록 modeling script를 수정했습니다.


if MorehRMSNorm is not None:
logger.warning(
"You can't use Masked Structured Growth Training..! You should avoid using rmsnorm in any way. "

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Masked Structured Growth Training

이게 뭐에요..??

rmsnorm -> RMSNorm 🙏

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

남형님 modeling_llama 패치를 그대로 이식한거라 저도 잘...

G선생님의 답변 :

Masked Structural Growth (MSG) Training이란?

**Masked Structural Growth (MSG)**는 대형 언어 모델(Transformer)을 점진적으로 키워가며 학습하는 기법이에요.
즉, 처음부터 거대한 모델을 학습하지 않고, 작은 모델로 시작해서 점점 구조를 확장(레이어 수, hidden size, attention head 수 등 증가)해 나가는 방식입니다.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

용어는 수정해놨습니다

@codernine-moreh codernine-moreh merged commit 4123aa2 into v4.42.4-moreh Oct 31, 2025
3 checks passed
@codernine-moreh codernine-moreh deleted the yechan/add_fused_rmsnorm branch October 31, 2025 01:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants