Step3.5 support #19271

pwilkin · 2026-02-02T14:00:40Z

Since the model creators put up a llama.cpp fork, but they did it by bare directory upload and the llama.cpp version is quite far outdated, I attempted to pull a clean PR from their code. Not guaranteeing it's all correct, but putting it up since there's probably going to be lots of interest in the new model.

ggerganov · 2026-02-02T14:10:25Z

I was just downloading it and will give it a try soon.

src/models/step35-iswa.cpp

pwilkin · 2026-02-02T14:37:12Z

@ggerganov Any idea what the deal is with this?

/usr/bin/ld: ../../bin/libllama.so.0.0.1: undefined reference to `bool llama_model_loader::get_key_or_arr<float, 512ul>(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::array<float, 512ul>&, unsigned int, bool)'

Getting it locally too, not just on CI.

ggerganov · 2026-02-02T14:41:09Z

src/llama-model.cpp

+        case LLM_ARCH_STEP35:
+            {
+                ml.get_key(LLM_KV_ATTENTION_LAYERNORM_RMS_EPS, hparams.f_norm_rms_eps);
+
+                hparams.swa_type = LLAMA_SWA_TYPE_STANDARD;
+
+                // MoE + SWA parameters
+                ml.get_key(LLM_KV_EXPERT_FEED_FORWARD_LENGTH,        hparams.n_ff_exp);
+                ml.get_key(LLM_KV_EXPERT_SHARED_FEED_FORWARD_LENGTH, hparams.n_ff_shexp, false);
+                ml.get_key(LLM_KV_EXPERT_GATING_FUNC,               hparams.expert_gating_func, false);
+                ml.get_key(LLM_KV_EXPERT_WEIGHTS_SCALE,             hparams.expert_weights_scale, false);
+                ml.get_key(LLM_KV_EXPERT_WEIGHTS_NORM,              hparams.expert_weights_norm, false);
+
+                // Step35 uses sigmoid gating by default (if not set in GGUF)
+                if (hparams.expert_gating_func == LLAMA_EXPERT_GATING_FUNC_TYPE_NONE) {
+                    hparams.expert_gating_func = LLAMA_EXPERT_GATING_FUNC_TYPE_SIGMOID;
+                }
+
+                ml.get_key(LLM_KV_ATTENTION_SLIDING_WINDOW,          hparams.n_swa);
+                ml.get_key_or_arr(LLM_KV_ATTENTION_SLIDING_WINDOW_PATTERN, hparams.swa_layers, hparams.n_layer);
+                ml.get_key_or_arr(LLM_KV_ROPE_DIMENSION_COUNT_PER_LAYER, hparams.rope_dim_per_layer, hparams.n_layer);
+                ml.get_key_or_arr(LLM_KV_SWIGLU_LIMITS,        hparams.swiglu_limits,        hparams.n_layer);
+                ml.get_key_or_arr(LLM_KV_SWIGLU_LIMITS_SHARED, hparams.swiglu_limits_shared, hparams.n_layer);
+
+                // Optional: Step35-only gating for applying rope scaling (HF: yarn_only_types).
+                // Default is 3 (apply on all layers) if the key is absent.
+                ml.get_key(
+                    format("%s.rope.scaling.apply_mask", ml.get_arch_name().c_str()),
+                    hparams.rope_scaling_apply_mask,
+                    false
+                );
+
+                hparams.has_rope_freq_base_per_layer = ml.get_key_or_arr(
+                    format("%s.rope.freq_base_per_layer", ml.get_arch_name().c_str()),
+                    hparams.rope_freq_base_per_layer,
+                    hparams.n_layer,
+                    false
+                );
+
+                type = LLM_TYPE_UNKNOWN;
+            } break;


Maybe one of these does not compile correctly. On mac it builds OK btw.

forforever73 · 2026-02-02T15:23:29Z

@pwilkin Thanks a lot for putting this together so quickly, and for taking the time to extract the changes into a clean PR — we really appreciate the interest and effort around step3.5. We’re the authors of the step3.5 model, and we already have an official llama.cpp integration prepared, which we’re planning to submit very shortly (likely tomorrow).

For context, the earlier bare directory upload was on us — that was a mistake on our side, and we should have prepared a proper PR from the start. Thanks for jumping in and helping turn that into a PR.

Since the upcoming PR will be the reference implementation we’ll be maintaining going forward, we’d prefer to have that land as the primary integration. For that reason, we were hoping this PR could be held off from merging for now.

Thanks again for the quick turnaround and for helping move support for the new model forward.

ngxson · 2026-02-02T15:53:45Z

src/llama-model.cpp

+                ml.get_key(
+                    format("%s.rope.scaling.apply_mask", ml.get_arch_name().c_str()),
+                    hparams.rope_scaling_apply_mask,
+                    false
+                );


I think you will need something like this (const char* instead of std::string):

template bool llama_model_loader::get_key_or_arr<float, 512>(const char *, std::array<float, 512> & result, uint32_t n, bool required);

But I think it's better to just put this as a LLM_KV_*, like all other metadata keys

Already did it :>

ngxson · 2026-02-02T15:54:42Z

convert_hf_to_gguf.py

+                apply_mask |= 0x1
+            if "sliding_attention" in yarn_only_types:
+                apply_mask |= 0x2
+        self.gguf_writer.add_uint32(f"{self.gguf_writer.arch}.rope.scaling.apply_mask", int(apply_mask))


Better to add this as a self.gguf_writer.add_* function

ggerganov · 2026-02-02T18:07:52Z

convert_hf_to_gguf.py

                raise ValueError(f"Unprocessed experts: {experts}")


+@ModelBase.register("Step35ForCausalLM")


Should this be:

Suggested change

@ModelBase.register("Step35ForCausalLM")

@ModelBase.register("Step3p5ForCausalLM")

pwilkin · 2026-02-03T18:01:13Z

Closing as duplicate of #19283

pwilkin added 2 commits February 2, 2026 14:56

PR pulled from model creators' repo

e227bfa

And this

e948314

pwilkin requested review from CISC and ggerganov as code owners February 2, 2026 14:00

pwilkin mentioned this pull request Feb 2, 2026

Feature Request: support STEP3-VL-10B #19258

Open

4 tasks

CMakeLists

a56e6e7

ggerganov reviewed Feb 2, 2026

View reviewed changes

src/models/step35-iswa.cpp Outdated Show resolved Hide resolved

ggerganov reviewed Feb 2, 2026

View reviewed changes

loci-dev mentioned this pull request Feb 2, 2026

UPSTREAM PR #19271: Step3.5 support auroralabs-loci/llama.cpp#1137

Open

pwilkin added 2 commits February 2, 2026 15:49

Missing template init.

7b37170

Feed the EditorChecker

627b26f

github-actions bot added model Model specific python python script changes labels Feb 2, 2026

Remove unnecessary broadcast

b875514

ngxson reviewed Feb 2, 2026

View reviewed changes

ggerganov reviewed Feb 2, 2026

View reviewed changes

rick-github mentioned this pull request Feb 3, 2026

Step 3.5 Flash ollama/ollama#14043

Open

pwilkin closed this Feb 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Step3.5 support #19271

Step3.5 support #19271

pwilkin commented Feb 2, 2026

Uh oh!

ggerganov commented Feb 2, 2026

Uh oh!

Uh oh!

pwilkin commented Feb 2, 2026 •

edited

Loading

Uh oh!

ggerganov Feb 2, 2026

Uh oh!

pwilkin Feb 2, 2026

Uh oh!

forforever73 commented Feb 2, 2026

Uh oh!

ngxson Feb 2, 2026 •

edited

Loading

Uh oh!

pwilkin Feb 2, 2026

Uh oh!

ngxson Feb 2, 2026

Uh oh!

ggerganov Feb 2, 2026

Uh oh!

pwilkin commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		raise ValueError(f"Unprocessed experts: {experts}")


		@ModelBase.register("Step35ForCausalLM")

	@ModelBase.register("Step35ForCausalLM")
	@ModelBase.register("Step3p5ForCausalLM")

Step3.5 support #19271

Step3.5 support #19271

Conversation

pwilkin commented Feb 2, 2026

Uh oh!

ggerganov commented Feb 2, 2026

Uh oh!

Uh oh!

pwilkin commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ggerganov Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

pwilkin Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

forforever73 commented Feb 2, 2026

Uh oh!

ngxson Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pwilkin Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

ngxson Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

ggerganov Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

pwilkin commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

pwilkin commented Feb 2, 2026 •

edited

Loading

ngxson Feb 2, 2026 •

edited

Loading