Autoparser - complete refactoring of parser architecture #18675

pwilkin · 2026-01-07T18:45:28Z

This is a huge endeavor that I promised back when I applied for maintaining the parser code. The legacy parser code was hard to maintain and buggy and supporting new models with it was really annoying. There was a worthwhile contribution by @hksdpc255 to add some XML toolcalling abstractions, but that was still just a patch on an open wound.

Thanks to @aldehir and his PEG parser, I managed to create an autoparser mechanism, using all the currently supported templates, their parsers and test cases as base. The idea is simple: most models' syntax follows the general pattern of:

<reasoning_markers> <reasoning_content> <end_of_reasoning_markers> <content_markers> <main_content> <end_of_content_markers> <tool_call_markers> ( <json> | <function marker> <args json> | <function marker> <args marker> <value json> ) <end_of_tool_call_marker>

Of course, some elements might not be present in a given template, but that's the general structure. Since this is a pretty finite structure, it's possible to determine the relevant elements by differential analysis - similar to how Minja already does capability detection, but more fine-grained, because by comparing various template outputs, we get to actually extract the relevant markers.

Some models will obviously not get handled so easily. However, in the course of implementing the mechanism, only two models remained that needed to get their separate parsers: Ministral and GPT-OSS, and the prior not because of its complexity, but of the need to rewrite the message structure passed to the template. GPT-OSS is a different beast since it supports arbitrarily many interleaved blocks, so it doesn't fit into the scheme that I mentioned above (but its parser has been rewritten to PEG as well).

This is currently anchored on Minja and uses its capability detection, but since the differential analysis already does its own capability detection, I fully expect to throw that part out and base this on @ngxson 's #18462 instead.

Obsoletes #18353 (sorry @ochafik - I know you put a lot of work into that).

Old parsers, tests and all supporting code are thrown out, templates got new PEG-parser based testcases, all of them now also test streaming behavior. I have tested this extensively on agentic coding (mostly with OpenCode) to ensure that this actually works (my wish to refactor the parser code was mostly caused by my prior experience with agentic coding on llama.cpp, which was extremely buggy with a lot of models, this is an attempt to remedy that). Hopefully, having one unified codebase with a largely reduced line-of-code count will make it easier to fix any potential errors.

This also means that there is no longer need to provide support for new models' specific templates unless they have some odd constructs - they should be supported out of the box. There's a new tool called debug-template-parser that you can point to any Jinja template file or GGUF model with an embedded Jinja template and have it spit out the details of the generated autoparser + toolcaling grammar.

Oh, important note: all Minja polyfills have been disabled. Working templates are now required. Why I see why a year and a half ago having proof-of-concept code that supported tool calling on models that didn't natively have tool calling might've been useless, right now supporting that is making it harder to properly support current and actually used models. Therefore, a functional template with tool calling is required if someone wants tool calling.

I want to ask everyone from the community who can to test this. I will keep this branch current with master, I tried to test this as much as I could, but I'm just one person doing this after work, so obviously my testing abilities were limited. I will keep this as draft until I've gathered enough feedback and testing data.

To not clutter the main repository's issue tracker, please report bugs either (a) in this thread or (b) in my issue tracker https://github.com/pwilkin/llama.cpp/issues

AI DISCLOSURE: Gemini Pro 3, Flash 3, Opus 4.5 and GLM 4.7 would like to admit that a human element did at some points interfere in the coding process, being as bold as to even throw most of the code out at some point and demand it rewritten from scratch. The human also tinkered the code massively, removing a lot of our beautiful comments and some code fragments that they claimed were useless. They had no problems, however, in using us to do all the annoying marker arithmetic. Therefore, we disavow any claim to this code and cede the responsibility onto the human.

hksdpc255 · 2026-01-08T02:57:07Z

Does this mean we don’t need to write a parser anymore, and it will be automatically generated from the chat template?

pwilkin · 2026-01-08T03:27:57Z

Does this mean we don’t need to write a parser anymore, and it will be automatically generated from the chat template?

Yup, that's the gist of it.

hksdpc255 · 2026-01-08T03:55:35Z

This feels almost magical. How does it work? Does it detect common patterns in the rendered template output? What happens if the chat template requires additional arguments?

pwilkin · 2026-01-08T12:03:59Z

This feels almost magical. How does it work? Does it detect common patterns in the rendered template output? What happens if the chat template requires additional arguments?

Yeah, it does differential analysis - it prepares different inputs to the template and then tests the outputs, for example, by using a the same function signature with a different name you can identify where the function name goes, by using the same function with one and two parameters you can identify how parameters are passed etc. etc.

The nice thing is, I managed to squish it to just 2k lines of code (1k for analysis and 1k for helpers), so it's not even that bloated.

As for custom inputs - I assume standard inputs here and that's what most template makers try to adhere to anyway. If not, you end up with a custom handler like for Ministral - but as a followup I want to separate handlers from parsers (since passing extra params is much eaasier than handling an entire template from scratch) or even add autodetection for common custom keywords (we're going to have to support "reasoning" in addition to "reasoning_content" at some point because vLLM is moving to that).

hksdpc255 · 2026-01-09T01:43:20Z

This approach does not seem to work well for models like Kimi-K2-Thinking, which may generate tool calls inside the thinking block, while the chat template itself automatically closes the thinking block correctly. In other words, the model’s behavior does not seem to be fully aligned with the assumptions made by the chat template. Is that understanding correct? I noticed that you have removed all parsers.

Additionally, I am planning to add a new custom parser for MiroThinker. Its official chat template does not accurately reflect the rendering logic actually used in their benchmarks. Is there a recommended starting point for implementing such a parser for the new parsing architecture?

pwilkin · 2026-01-09T11:26:06Z

I've heard of those mysterious tool calls inside thinking blocks for K2-Thinking, but I've yet to know if they are an actual thing or if they are just an artifact of low quantization. To be honest, outside of the native provider, I haven't seen K2-Thinking implemented anywhere in a working fashion. The Chutes version that I tested quite a few times bugs out on tool calling extremely often.

I'm really skeptical of modifying anything based on hearsay and things "floating around". I remember the discussion here about interleaved thinking and I myself was convinced that meant models could have multiple <think> blocks until @aldehir pointed out that it's all a big misunderstanding and "interleaved thinking" is just the model having multiple message['assistant']['reasoning_content'] blocks next to message['assistant']['tool_call'] blocks. If I really see a working solution with open-sourced code anywhere that really demonstrates support for those thinking blocks, then sure, we can consider a special parser for K2-Thinking.

As for the Mirocode, I guess you're talking about adapting the Python code-based stuff they showed (the one that uses separate tags for MCP servers and code calling)? You can see how custom parsers are defined in chat.cpp, not much has changed besides the fact that since we use the PEG parser there's no longer a dedicated parse() and init() function and the entire parser is defined in the init. I'll probably separate the parsers into dedicated files soon.

pwilkin · 2026-01-17T17:32:16Z

All right, I've reached the "all tests passed" phase for test-chat, so I'm moving this officially out of draft. Will still test in practice but want to get all structural / architectural etc. issues out of the way in the meantime.

ngxson · 2026-01-21T15:06:07Z

tests for one template double down as tests for all templates with similar mechanisms, it is very hard to actually introduce a breaking change that does not fail test-chat tests, thus, the warning comes very early that there's a regression introduced.

I see your point, but in theory it will be tricky to test all feature (or component) combinations. For example, if a template has feature A, B and another has feature B, C, then the test must cover AB, BC, AC and potentially ABC.

Although this is just from a very theoretical POV, I think it can be related to @aldehir's comment from earlier: the autoparser, instead of exhaustively detect all possible features A B, can maybe detect a larger D=A+B. From your example, the Qwen3 tool calling can potentially be a larger feature called QWEN3_CODER_TOOL_COMPONENT for example.

This will effectively make the implementation, especially chat-auto-parser-generator.cpp, to contain some model-specific parser (just the component), but it brings back my point about being test-able (ref: #18675 (comment)): g instead of producing f directly, now it produces a tuple of features that allow another function h to finally construct f: h(g(t)) = f

Ofc your current version already doing that, but the number of features returned by g is too large that testing all combinations is not practically possible. So I personally think (and also inferred from @aldehir comment) that it will be more beneficial to limit the number of detected features.

ngxson · 2026-01-21T15:12:11Z

common/chat-auto-parser.h

+    // For FUNC_PREFIXED_INDEXED format (e.g., Kimi-K2)
+    std::string per_call_start;      // e.g., "<|tool_call_begin|>"
+    std::string function_namespace;  // e.g., "functions." (prefix before function name)
+    std::string args_marker;         // e.g., "<|tool_call_argument_begin|>"
+    std::string per_call_end;        // e.g., "<|tool_call_end|>"
+
+    // For FUNC_BRACKET_TAG format (e.g., Mistral Small 3.2)
+    std::string id_marker;  // e.g., "[CALL_ID]" - marker before tool call ID
+
+    // For FUNC_MARKDOWN_CODE_BLOCK format (e.g., Cohere Command-R Plus)
+    std::string code_block_marker;    // e.g., "Action:" - text marker before code block
+    std::string code_block_language;  // e.g., "json" - language identifier in code fence


To continue my point above, (please correct if I'm wrong) seems like this list of features is model-specific and for example, there is likely no chance that a model in the future will use FUNC_PREFIXED_INDEXED and FUNC_BRACKET_TAG at the same time

So, I think these can be grouped into a larger feature, and since some tokens are practically unchanged, like <|tool_call_begin|>, we can hard-code them inside the parser code. That will make the code look more explicit.

pwilkin · 2026-02-03T13:42:23Z

Aight @ngxson @aldehir I completely rewrote the core autoparser architecture. This time, I manually wrote most of the code myself, not just supervised it. The mechanism now works like this:

-> first, an analyzer does marker analysis in stages (reasoning / content / tool calling format / tool calling markers)
-> then, optional workarounds are applied for templates that fit into the autoparser architecture, but are not properly written so automatic detection won't work (like old Deepseek/QwQ reasoning parsers)
-> finally, a PEG parser is generated from the analysis, piece-wise

Of course, it's still possible to write a separate parser for templates that don't conform to the scheme.

All the analysis functions now explicitly work with the same differential analysis framework (take messages to compare, analyze, make a diff avoiding marker boundaries, extract markers).

Hopefully, this is a bit more clear and manageable than the previous version :) adds a bit more elasticity too with the analysis workarounds that are a halfway measure between a dedicated parser and trying to fit all edge cases into the general engine.

Did some fixes to the Jinja engine as well (rendering objects / caps detection).

common/jinja/value.h

common/jinja/value.cpp

common/jinja/value.h

loci-dev mentioned this pull request Jan 7, 2026

UPSTREAM PR #18675: Autoparser - complete refactoring of parser architecture auroralabs-loci/llama.cpp#845

Open

github-actions bot added documentation Improvements or additions to documentation model Model specific script Script related testing Everything test related examples python python script changes server labels Jan 7, 2026

github-actions bot mentioned this pull request Jan 8, 2026

Reddit News Daily 2026-01-08 gitlawr/reddit-daily-news#118

Open

pwilkin force-pushed the autoparser branch 2 times, most recently from dc7dd03 to 5519998 Compare January 8, 2026 14:53

pwilkin force-pushed the autoparser branch 2 times, most recently from 420f7bf to 9ea502a Compare January 13, 2026 16:23

akoumjian mentioned this pull request Jan 15, 2026

enforce response_format and json_schema for Kimi K2 #18851

Open

pwilkin mentioned this pull request Jan 15, 2026

Feature Request: chat format for INTELLECT-3 #17559

Open

4 tasks

pwilkin force-pushed the autoparser branch 2 times, most recently from a963e86 to 3594bd5 Compare January 16, 2026 23:13

pwilkin marked this pull request as ready for review January 17, 2026 17:31

pwilkin requested review from CISC, aldehir, ggerganov and ngxson as code owners January 17, 2026 17:31

github-actions bot added the jinja parser Issues related to the jinja parser label Jan 17, 2026

ngxson reviewed Jan 21, 2026

View reviewed changes

pwilkin force-pushed the autoparser branch 2 times, most recently from 88385c5 to c10965e Compare January 23, 2026 20:32

CISC mentioned this pull request Jan 27, 2026

jinja : undefined should be treated as sequence/iterable (return string/array) by filters/tests #19147

Merged

HelloKS mentioned this pull request Jan 29, 2026

chat : add parsing for solar-open-100b #18540

Merged

pwilkin and others added 8 commits February 2, 2026 23:55

Add workaround for templates requiring non-null content

dd9ff05

Fix bad typo

26a742d

Fix sanitizer warnings

4147ee5

Make call IDs nine-character

c40d7c6

THE GIANT AUTOPARSER SQUISH

ce27b40

ANOTHER GIANT POST-FIXUP SQUISH

2c046c0

Missed this.

93e5a51

Quick vibe-coded fix for proper object printing

d55652d

pwilkin force-pushed the autoparser branch from f5d0f6f to d55652d Compare February 3, 2026 00:42

loci-dev mentioned this pull request Feb 3, 2026

UPSTREAM PR #18675: Autoparser - complete refactoring of parser architecture auroralabs-loci/llama.cpp#1141

Open

Fix reasoning detection

876531a

github-actions bot added the build Compilation issues label Feb 3, 2026

More robust reasoning detection

2e32439

CISC reviewed Feb 3, 2026

View reviewed changes

common/jinja/value.h Outdated Show resolved Hide resolved

pwilkin added 4 commits February 3, 2026 17:16

Reverd bad change fix some templates and most tests

08d857f

Fix error in argument processing

e223f47

Feeding the hungry editor checker god.

12ec266

Fix incorrect coercion of strings to non-string types during parsing

bbd630c

CISC reviewed Feb 3, 2026

View reviewed changes

common/jinja/value.cpp Outdated Show resolved Hide resolved

common/jinja/value.h Show resolved Hide resolved

pwilkin added 3 commits February 3, 2026 19:09

Fix minor regressions, add [[noreturn]] attrib

955486d

We don't like segfaults (or failing tests).

55b0706

Remove [[noreturn]] as it causes compilation problems on Mac.

31705ab

pwilkin requested a review from ngxson February 4, 2026 00:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Autoparser - complete refactoring of parser architecture #18675

Autoparser - complete refactoring of parser architecture #18675

pwilkin commented Jan 7, 2026 •

edited

Loading

Uh oh!

hksdpc255 commented Jan 8, 2026

Uh oh!

pwilkin commented Jan 8, 2026

Uh oh!

hksdpc255 commented Jan 8, 2026

Uh oh!

pwilkin commented Jan 8, 2026 •

edited

Loading

Uh oh!

hksdpc255 commented Jan 9, 2026 •

edited

Loading

Uh oh!

pwilkin commented Jan 9, 2026

Uh oh!

pwilkin commented Jan 17, 2026

Uh oh!

ngxson commented Jan 21, 2026 •

edited

Loading

Uh oh!

ngxson Jan 21, 2026

Uh oh!

pwilkin commented Feb 3, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Autoparser - complete refactoring of parser architecture #18675

Are you sure you want to change the base?

Autoparser - complete refactoring of parser architecture #18675

Conversation

pwilkin commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hksdpc255 commented Jan 8, 2026

Uh oh!

pwilkin commented Jan 8, 2026

Uh oh!

hksdpc255 commented Jan 8, 2026

Uh oh!

pwilkin commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hksdpc255 commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pwilkin commented Jan 9, 2026

Uh oh!

pwilkin commented Jan 17, 2026

Uh oh!

ngxson commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngxson Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

pwilkin commented Feb 3, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

pwilkin commented Jan 7, 2026 •

edited

Loading

pwilkin commented Jan 8, 2026 •

edited

Loading

hksdpc255 commented Jan 9, 2026 •

edited

Loading

ngxson commented Jan 21, 2026 •

edited

Loading