Skip to content

Conversation

@pwilkin
Copy link
Collaborator

@pwilkin pwilkin commented Feb 2, 2026

Since the model creators put up a llama.cpp fork, but they did it by bare directory upload and the llama.cpp version is quite far outdated, I attempted to pull a clean PR from their code. Not guaranteeing it's all correct, but putting it up since there's probably going to be lots of interest in the new model.

@ggerganov
Copy link
Member

I was just downloading it and will give it a try soon.

@pwilkin
Copy link
Collaborator Author

pwilkin commented Feb 2, 2026

@ggerganov Any idea what the deal is with this?

/usr/bin/ld: ../../bin/libllama.so.0.0.1: undefined reference to `bool llama_model_loader::get_key_or_arr<float, 512ul>(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::array<float, 512ul>&, unsigned int, bool)'

Getting it locally too, not just on CI.

Comment on lines +2453 to +2493
case LLM_ARCH_STEP35:
{
ml.get_key(LLM_KV_ATTENTION_LAYERNORM_RMS_EPS, hparams.f_norm_rms_eps);

hparams.swa_type = LLAMA_SWA_TYPE_STANDARD;

// MoE + SWA parameters
ml.get_key(LLM_KV_EXPERT_FEED_FORWARD_LENGTH, hparams.n_ff_exp);
ml.get_key(LLM_KV_EXPERT_SHARED_FEED_FORWARD_LENGTH, hparams.n_ff_shexp, false);
ml.get_key(LLM_KV_EXPERT_GATING_FUNC, hparams.expert_gating_func, false);
ml.get_key(LLM_KV_EXPERT_WEIGHTS_SCALE, hparams.expert_weights_scale, false);
ml.get_key(LLM_KV_EXPERT_WEIGHTS_NORM, hparams.expert_weights_norm, false);

// Step35 uses sigmoid gating by default (if not set in GGUF)
if (hparams.expert_gating_func == LLAMA_EXPERT_GATING_FUNC_TYPE_NONE) {
hparams.expert_gating_func = LLAMA_EXPERT_GATING_FUNC_TYPE_SIGMOID;
}

ml.get_key(LLM_KV_ATTENTION_SLIDING_WINDOW, hparams.n_swa);
ml.get_key_or_arr(LLM_KV_ATTENTION_SLIDING_WINDOW_PATTERN, hparams.swa_layers, hparams.n_layer);
ml.get_key_or_arr(LLM_KV_ROPE_DIMENSION_COUNT_PER_LAYER, hparams.rope_dim_per_layer, hparams.n_layer);
ml.get_key_or_arr(LLM_KV_SWIGLU_LIMITS, hparams.swiglu_limits, hparams.n_layer);
ml.get_key_or_arr(LLM_KV_SWIGLU_LIMITS_SHARED, hparams.swiglu_limits_shared, hparams.n_layer);

// Optional: Step35-only gating for applying rope scaling (HF: yarn_only_types).
// Default is 3 (apply on all layers) if the key is absent.
ml.get_key(
format("%s.rope.scaling.apply_mask", ml.get_arch_name().c_str()),
hparams.rope_scaling_apply_mask,
false
);

hparams.has_rope_freq_base_per_layer = ml.get_key_or_arr(
format("%s.rope.freq_base_per_layer", ml.get_arch_name().c_str()),
hparams.rope_freq_base_per_layer,
hparams.n_layer,
false
);

type = LLM_TYPE_UNKNOWN;
} break;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe one of these does not compile correctly. On mac it builds OK btw.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it.

@github-actions github-actions bot added model Model specific python python script changes labels Feb 2, 2026
@forforever73
Copy link

@pwilkin Thanks a lot for putting this together so quickly, and for taking the time to extract the changes into a clean PR — we really appreciate the interest and effort around step3.5. We’re the authors of the step3.5 model, and we already have an official llama.cpp integration prepared, which we’re planning to submit very shortly (likely tomorrow).

For context, the earlier bare directory upload was on us — that was a mistake on our side, and we should have prepared a proper PR from the start. Thanks for jumping in and helping turn that into a PR.

Since the upcoming PR will be the reference implementation we’ll be maintaining going forward, we’d prefer to have that land as the primary integration. For that reason, we were hoping this PR could be held off from merging for now.

Thanks again for the quick turnaround and for helping move support for the new model forward.

Comment on lines +2479 to +2483
ml.get_key(
format("%s.rope.scaling.apply_mask", ml.get_arch_name().c_str()),
hparams.rope_scaling_apply_mask,
false
);
Copy link
Collaborator

@ngxson ngxson Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you will need something like this (const char* instead of std::string):

template bool llama_model_loader::get_key_or_arr<float, 512>(const char *, std::array<float, 512> & result, uint32_t n, bool required);

But I think it's better to just put this as a LLM_KV_*, like all other metadata keys

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already did it :>

apply_mask |= 0x1
if "sliding_attention" in yarn_only_types:
apply_mask |= 0x2
self.gguf_writer.add_uint32(f"{self.gguf_writer.arch}.rope.scaling.apply_mask", int(apply_mask))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to add this as a self.gguf_writer.add_* function

raise ValueError(f"Unprocessed experts: {experts}")


@ModelBase.register("Step35ForCausalLM")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be:

Suggested change
@ModelBase.register("Step35ForCausalLM")
@ModelBase.register("Step3p5ForCausalLM")

@pwilkin
Copy link
Collaborator Author

pwilkin commented Feb 3, 2026

Closing as duplicate of #19283

@pwilkin pwilkin closed this Feb 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

model Model specific python python script changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants