fix/feat!: LLMs context management by mateuszlampert · Pull Request #819 · software-mansion/react-native-executorch

mateuszlampert · 2026-02-19T01:46:51Z

Description

This PR fixes few bugs related to the LLMs, caused by mixing two approaches - functional (as we pass whole messages history each time) and stateful (as we keep pos_ in the runner, representing at which position the KV cache is), which resulted in 3 bugs:

broken KV cache for reasoning models - in the runner, we counted tokens generated for the reasoning and included these in KV cache (pos_ += num_generated_tokens), but in next turns, jinja template removed these reasoning tokens from the messages history - as a result, KV-cache was incoherent
duplicated tokens in KV cache - we were passing whole messages history to the runner (functional approach), but we were also appending all tokens (prompt and generated) to the KV cache (which position is represented by pos_) - as a result tokens were "duplicated" in the KV cache and we were running out of available tokens very fast (exceeding context_window_length)
stateful TS functional API - even though our generate() method is called functional, it kept internal state in the runner (e. g. pos_)

These bugs were fixed by resetting the runner before each generation, which makes it truly functional - old messages are prefilled and the KV cache can be still used during generation phase.

Additionally, this PR adds ContextStrategy to ChatConfig interface, so now it's possible to define (or use one of already implemented) strategy for managing context (e. g. naive, message count based, sliding window) - it gives us more flexibility and user can decide what's best for their use case. From now on, SlidingWindowContextStrategy is also configured as the default one.

Introduces a breaking change?

Yes
No

These changes will not break anything until max number of messages is not modified (I removed contextWindowLength from ChatConfig and replaced it with contextStrategy)

Type of change

Bug fix (change which fixes an issue)
New feature (change which adds functionality)
Documentation update (improves or adds clarity to existing documentation)
Other (chores, tests, code style improvements etc.)

Tested on

iOS
Android

Testing instructions

Run example llm app, open executorch logs (adb logcat | grep -i "executorch" for example) and see if numbers of tokens are properly aligned and if pos_ is correct.

To test different context management strategies, change contextStrategy in llm app and modify model configuration.

Screenshots

Related issues

#776

Checklist

I have performed a self-review of my code
I have commented my code, particularly in hard-to-understand areas
I have updated the documentation accordingly
My changes generate no new warnings

Additional notes

Position in KV cache, number of prompt tokens and number of generated tokens for both non-reasoning and reasoning models BEFORE changes.

LLAMA 3.2 1B SPINQUANT (without reasoning)

pos_	Prompt tokens	Generated tokens
0	335	269
604=269+335	872	372
1848=604+872+372	1513	CRASH

QWEN 3.0 0.6B QUANTIZED (with reasoning)

pos_	Prompt tokens	Generated tokens
0	309	457
766=309+457	617 (<766!)	192
1575=766+617+192	925 (<1575!)	CRASH

mateuszlampert · 2026-02-19T10:02:38Z

I will update docs after the code gets approved

msluszniak · 2026-02-19T10:08:02Z

packages/react-native-executorch/common/runner/runner.cpp

+int32_t Runner::count_text_tokens(const std::string &text) const {
+  auto encodeResult =
+      tokenizer_->encode(text, numOfAddedBoSTokens, numOfAddedEoSTokens);
+
+  if (!encodeResult.ok()) {
+    throw rnexecutorch::RnExecutorchError(
+        rnexecutorch::RnExecutorchErrorCode::TokenizerError,
+        "Encoding failed during token count check.");
+  }
+
+  return static_cast<int32_t>(encodeResult.get().size());
+}
+


I'm wondering if calling encoder specifically to get only the size of encoded text is the most efficient way to get the size. Can we compute this during encoding phase?

If I understand this correctly that's not possible for 2 reasons:

I need to calculate all tokens in the whole messages history + new message (so I don't know number of tokens for new message)

I can't really use token counts from the runner reliably, because it counts reasoning tokens as well for reasoning models (and later on, this reasoning tokens are not included in jinja template) - it could lead to some discrepancies

Alternatively, maybe we could implement these context strategies in the c++ code, but as a result we give no flexibility to the user (?)

Also:

the context management strategy is configurable, so it can be switched on/off oraz replaced with other strategy - by default, we do not use sliding window, but the strategy based on number of messages (like it was before)

I think that encoding phase should not take that long

We don't need to bother ourselves with computational complexity here, this is just tokenizer encoding which is very cheap and only performed once for the given prompt and so only impacts Time To First Token

msluszniak

Please also resolve conflicts

packages/react-native-executorch/common/runner/runner.cpp

mateuszlampert · 2026-02-20T12:39:19Z

I changed only docs for useLLM and LLMModule - the rest of changes is related to autogenerated api reference changes

packages/react-native-executorch/common/rnexecutorch/models/llm/LLM.h

mateuszlampert added 7 commits February 19, 2026 00:44

fix: reset() llm before each call to keep it functional

fb26ec7

chore: remove console.log()

1ebccfe

feat: add various strategies for handling context window

edb1306

feat: make necessary methods and classes public

3ebcb6e

Merge branch 'main' into @ml/llm-functional-api-context-management

0f12695

fix: method name

56ff74c

chore: update llms example app with strategy configuration

d2eed3c

mateuszlampert changed the title ~~@ml/llm functional api context management~~ fix/feat: LLMs context management Feb 19, 2026

mateuszlampert linked an issue Feb 19, 2026 that may be closed by this pull request

LLM message history management not working + exceeding context window ([Error: Failed to generate text, error code: 18]) #776

Open

mateuszlampert marked this pull request as ready for review February 19, 2026 10:01

mateuszlampert requested review from chmjkb, mkopcins and msluszniak February 19, 2026 10:01

msluszniak reviewed Feb 19, 2026

View reviewed changes

Merge branch 'main' into @ml/llm-functional-api-context-management

14a6b9c

msluszniak reviewed Feb 20, 2026

View reviewed changes

packages/react-native-executorch/common/runner/runner.cpp Outdated Show resolved Hide resolved

mateuszlampert added 5 commits February 20, 2026 12:05

rework context strategies maContextLength

8b7ba1d

chore: update default context strategy

587f773

chore: unify types

3acba46

chore: update docs

6db88a2

chore: update config example in skill

c176e20

mateuszlampert requested a review from msluszniak February 20, 2026 12:38

msluszniak reviewed Feb 20, 2026

View reviewed changes

packages/react-native-executorch/common/rnexecutorch/models/llm/LLM.h Outdated Show resolved Hide resolved

msluszniak assigned mateuszlampert Feb 20, 2026

msluszniak added the bug fix PRs that are fixing bugs label Feb 20, 2026

msluszniak added the feature PRs that implement a new feature label Feb 20, 2026

msluszniak changed the title ~~fix/feat: LLMs context management~~ fix/feat!: LLMs context management Feb 20, 2026

This comment was marked as resolved.

Sign in to view

fix: unify types to work on android

dae9356

mateuszlampert requested a review from msluszniak February 20, 2026 14:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

fix/feat!: LLMs context management#819

fix/feat!: LLMs context management#819
mateuszlampert wants to merge 14 commits intomainfrom
@ml/llm-functional-api-context-management

mateuszlampert commented Feb 19, 2026 •

edited by msluszniak

Loading

Uh oh!

mateuszlampert commented Feb 19, 2026

Uh oh!

msluszniak Feb 19, 2026

Uh oh!

mateuszlampert Feb 19, 2026

Uh oh!

mateuszlampert Feb 19, 2026 •

edited

Loading

Uh oh!

mkopcins Feb 19, 2026

Uh oh!

msluszniak left a comment

Uh oh!

Uh oh!

mateuszlampert commented Feb 20, 2026

Uh oh!

Uh oh!

This comment was marked as resolved.

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

mateuszlampert commented Feb 19, 2026 • edited by msluszniak Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Introduces a breaking change?

Type of change

Tested on

Testing instructions

Screenshots

Related issues

Checklist

Additional notes

Uh oh!

mateuszlampert commented Feb 19, 2026

Uh oh!

msluszniak Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

mateuszlampert Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

mateuszlampert Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mkopcins Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

msluszniak left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mateuszlampert commented Feb 20, 2026

Uh oh!

Uh oh!

This comment was marked as resolved.

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mateuszlampert commented Feb 19, 2026 •

edited by msluszniak

Loading

mateuszlampert Feb 19, 2026 •

edited

Loading