Skip to content

Conversation

@aldehir
Copy link
Collaborator

@aldehir aldehir commented Jan 2, 2026

Chat parser for Solar-Open-100B.

Features

  • Reasoning parsing
  • Reasoning injection via reasoning_content field for interleaved thinking
  • response_format parsing
  • Tool call parsing, including tool_choice = required and reasoning

The following variables can be modified via chat template kwargs:

  • default_system_prompt: bool = true - Include default system prompt
  • reasoning_effort: "minimal" | "low" | "medium" | "high" = "high" - Set reasoning effort. When set to low or minimal, reasoning is disabled.
  • think_render_option: "all" | "lastthink" = "lastthink" - Determines when to render reasoning traces when fed back for interleaved rendering. The default (lastthink) only includes reasoning after the last user message. The all option includes reasoning for all assistant messages.

@aldehir aldehir mentioned this pull request Jan 2, 2026
1 task
@github-actions github-actions bot added the testing Everything test related label Jan 2, 2026
@aldehir aldehir marked this pull request as ready for review January 2, 2026 09:49
@HelloKS
Copy link
Contributor

HelloKS commented Jan 2, 2026

Hello, Thanks for the PR! I was waiting for this.

I tried with reasoning_effort = low, and it seems doesn't like it for some reason.

image
slot process_toke: id  3 | task 0 | n_decoded = 1, n_remaining = -1, next token:    22 '<|think|>'
srv  update_slots: run slots completed
que    start_loop: waiting for new tasks
que    start_loop: processing new tasks
que    start_loop: processing task, id = 1
que    start_loop: update slots
srv  update_slots: posting NEXT_RESPONSE
que          post: new task, id = 2, front = 0
slot update_slots: id  3 | task 0 | slot decode token, n_ctx = 32000, n_tokens = 77, truncated = 0
srv  update_slots: decoding batch, n_tokens = 1
clear_adapter_lora: call
set_embeddings: value = 0
srv  update_chat_: Parsing chat message: <|think|>
Parsing input with format peg-native: <|think|>
res  remove_waiti: remove task 0 from waiting list. current waiting = 1 (before remove)
srv          stop: cancel task, id_task = 0
res  remove_waiti: remove task 0 from waiting list. current waiting = 0 (before remove)
que          post: new task, id = 3/1, front = 1
srv          stop: all tasks already finished, no need to cancel
srv    operator(): got exception: {"error":{"code":500,"message":"Failed to parse input at pos 0","type":"server_error"}}
srv  log_server_r: request: POST /v1/chat/completions 127.0.0.1 500
srv  log_server_r: request:  {"model":"model-Q4_K_M.gguf","temperature":0.8,"top_p":0.95,"top_k":50,"chat_template_kwargs":{"reasoning_effort":"low"},"messages":[{"role":"user","content":"Who are yo
u?"}],"stream":true,"stream_options":{"include_usage":true}}
srv  log_server_r: response: {"error":{"code":500,"message":"Failed to parse input at pos 0","type":"server_error"}}

Tool calling, chat with reasoning works well.

@aldehir
Copy link
Collaborator Author

aldehir commented Jan 2, 2026

@HelloKS, thanks for that info. I should have done more thorough testing with low and minimal. The parsing changes, because it's not a continuation of <|begin|>assistant anymore. I'll just make the parsing a bit more lax.

@aldehir
Copy link
Collaborator Author

aldehir commented Jan 2, 2026

@HelloKS Give it a try with 980c772 c41d8f8

image

Looks like low/minimal adds an empty think section, but the model can still resume with a thought? Interesting. This should make it more permissive though.

I didn't realize it always appends <|begin|>assistant even after adding an empty think section.

@HelloKS
Copy link
Contributor

HelloKS commented Jan 2, 2026

Yes, It now works without reasoning (even with tooling!)

@HelloKS
Copy link
Contributor

HelloKS commented Jan 2, 2026

Interesting, model sometimes reasoning even with "reasoning_effort": "low"

image
Translate this sentence into English:
…本当に、馬鹿馬鹿しい
image
다음 문장을 한국어로 설명 없이 번역하세요:
…本当に、馬鹿馬鹿しい

(= Translate this sentence into Korean without any explanation)

Not sure this behavior is from trained, or template though.

@aldehir
Copy link
Collaborator Author

aldehir commented Jan 2, 2026

Have you tried minimal instead of low? I believe this is the training. When it's either of those, it adds the following to the template:

<|begin|>assistant<|think|><|end|><|begin|>assistant

Any additional <|think|> tags generated would likely be from the training or maybe quantization? Not sure.

@HelloKS
Copy link
Contributor

HelloKS commented Jan 2, 2026

Have you tried minimal instead of low? I believe this is the training. When it's either of those, it adds the following to the template:

<|begin|>assistant<|think|><|end|><|begin|>assistant

Any additional <|think|> tags generated would likely be from the training or maybe quantization? Not sure.

Minimal and low does the same behavior. I think it's ok because they didn't document this "reasoning off" feature. Maybe it was planned leftover, who will know lol

@LETS-BEE
Copy link
Contributor

any progress?

@HelloKS
Copy link
Contributor

HelloKS commented Jan 29, 2026

any progress?

It works perfectly (I'm using it locally), but just PR progress is stalled. Maybe related to #18675 ?

Copy link
Collaborator

@pwilkin pwilkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sorry. Yeah, let's merge it.

@pwilkin
Copy link
Collaborator

pwilkin commented Jan 29, 2026

@0cc4m @jeffbolznv Just FYI getting this test failure on CI:

FLASH_ATTN_EXT(hsk=128,hsv=128,nh=4,nr23=[12,1],kv=512,nb=35,mask=1,sinks=0,max_bias=8.000000,logit_softcap=10.000000,prec=def,type_KV=f32,permute=[0,1,2,3])

@0cc4m
Copy link
Collaborator

0cc4m commented Jan 29, 2026

Yeah, I'm aware of it, it only showed up after the merge of #19075, but not on the branch itself. I'll look into it.

@ggerganov
Copy link
Member

@0cc4m I think the reason is because in #19115 we added some new FA tests to exercise non-power-of-2 number of heads. I think these new tests are the one that are failing, so it is not a regression - just something that has been revealed by the new set of tests.

@pwilkin pwilkin merged commit 7b7ae85 into ggml-org:master Jan 29, 2026
77 of 78 checks passed
@aldehir
Copy link
Collaborator Author

aldehir commented Jan 29, 2026

@pwilkin thank you!

4b1tQu4ntN3k0 pushed a commit to 4b1tQu4ntN3k0/llama.cpp that referenced this pull request Feb 2, 2026
* chat : add parsing for solar-open-100b

* add comments to rules

* cont : make assistant start optional

* cont : remove assistant start prefix altogether

---------

Co-authored-by: Piotr Wilkin (ilintar) <piotr.wilkin@syndatis.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

testing Everything test related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants