Add 4bit gpt oss 20b #23

myobie · 2025-11-05T11:48:25Z

Adds the 4bit version of gpt oss 20B
Attempts to cleanup the harmony tokens a bit, the 4bit version isn’t always as precise
And add some logs

* Add more append* methods to UserInput so we can build up an input of all types of messages including harmony messages * Increase prefillStepSize so we process the entire context in one chunk * Add logging for debugging, sorry

atdrendel · 2025-11-05T15:34:46Z

@myobie Were there any tests that didn't work before making these changes that did work after making them?

myobie · 2025-11-05T15:42:39Z

@atdrendel I didn't run the tests locally, I wanted to let CI do it. I don't have the models in the right place. I also just wanted you to see this and see if it makes sense to you or not.

myobie · 2025-11-05T15:43:55Z

@atdrendel sorry in case it wasn't clear, this is just part of what I did to get gpt oss working on my laptop. But I don't know why gpt oss didn't work already, and wanted you to see this and hopefully you'd understand why.

atdrendel

@myobie It was a good idea to add additional logging.

atdrendel · 2025-11-06T00:07:32Z

@myobie Also, the prefillStepSize argument isn't used anywhere.

@atdrendel I didn't run the tests locally, I wanted to let CI do it. I don't have the models in the right place. I also just wanted you to see this and see if it makes sense to you or not.

By the way, @myobie, the models tests don't run in CI because downloading gigantic AI models to the CI machine would be expensive and slow. All the tests run, but ones that require the presence of an AI model just get skipped.

So, if you want to test gpt-oss or whatever, you need to run it locally. Just uncomment the resources: [] lines in Package.swift for which you've got a downloaded model that you want to test.

myobie · 2025-11-06T08:02:35Z

@atdrendel I'll retest today. When I try to use gpt oss it just doesn't work most times. I tried everything and when I added this setting it did work twice, but if it's not used then it must have been random.

I'm not worried about the tests, I'll try to run them today. I expect them to hang. I'm curious how you are able to run gpt oss at all.

myobie · 2025-11-06T08:38:14Z

@atdrendel when you say the prefill setting isn't used anywhere, do you mean inside MLX or in our code? Evacuee here I've changed the default to a higher value already in the optional override. The normal default is much lower, I thought?

atdrendel · 2025-11-06T09:22:58Z

@atdrendel when you say the prefill setting isn't used anywhere, do you mean inside MLX or in our code? Evacuee here I've changed the default to a higher value already in the optional override. The normal default is much lower, I thought?

Oh, I didn't realize the value you set it to was higher than the default. Either way, I kept the prefill size with the default value you set. So, it should work for you.

32 GB of RAM might not be enough to run gpt-oss at 8b quantization. On my Mac Mini, even though I have 64 GB of RAM, only 48 GB is usable by the GPU. So, I'm guessing your computer would only allow the GPU to use ~24 GB, which may be too little for gpt-oss.

* The 4bit will produce duplicate channel tokens * The function calls are sometimes a little different These changes should support both the 4bit and 8bit versions.

myobie · 2025-11-06T17:32:08Z

@atdrendel I am going to pivot this PR to just adding the 4bit version of gpt oss and the few changes to help when it’s less precise with it’s output. I’ve removed the prefill stuff and I’m going to remove most of the logging. I just don’t think I’ll be able to test this model in either 8 or 4bit right now, I’ve tried changing things (some I do, some I don’t understand) and it never really works on my machine. Having the 4bit version here is helpful and I can run the tests for the 4bit version and they do pass 99% of the time.

myobie · 2025-11-06T17:33:30Z

Sources/SHLLM/Harmony.swift

Sometimes the 4bit gpt oss model will output multiple channel tokens.

myobie · 2025-11-06T17:34:28Z

Sources/SHLLM/ResponseParser.swift

Sometimes the tool call function names are not output 100% correct by the 4bit gpt oss model.

myobie requested a review from atdrendel November 5, 2025 11:48

atdrendel added 2 commits November 5, 2025 17:48

Use os_log instead of print()

3f9c11e

Remove unused variables

c7c9d23

atdrendel approved these changes Nov 6, 2025

View reviewed changes

myobie and others added 2 commits November 6, 2025 12:00

Add 4bit gpt oss model and fix a few errors it exhibits

18a072d

* The 4bit will produce duplicate channel tokens * The function calls are sometimes a little different These changes should support both the 4bit and 8bit versions.

Remove unnecessary print()

a699312

myobie changed the title ~~Add more append* methods to UserInput and increase prefillStepSize~~ Attempt to cleanup harmony tokens when the 4bit version isn’t precise Nov 6, 2025

myobie changed the title ~~Attempt to cleanup harmony tokens when the 4bit version isn’t precise~~ Add 4bit gpt oss 20b Nov 6, 2025

Remove prefill thing

180ef21

REvert UserInput+SHLLM.swift and remove some logs

b16dd18

myobie commented Nov 6, 2025

View reviewed changes

Sources/SHLLM/Harmony.swift

Copy link

Member Author

myobie Nov 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes the 4bit gpt oss model will output multiple channel tokens.

myobie commented Nov 6, 2025

View reviewed changes

Sources/SHLLM/ResponseParser.swift

Copy link

Member Author

myobie Nov 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes the tool call function names are not output 100% correct by the 4bit gpt oss model.

myobie requested a review from atdrendel November 6, 2025 17:34

Date.now

e0bc531

atdrendel approved these changes Nov 6, 2025

View reviewed changes

atdrendel merged commit 57c10e4 into main Nov 6, 2025
1 check passed

atdrendel deleted the harmonize branch November 6, 2025 20:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add 4bit gpt oss 20b #23

Add 4bit gpt oss 20b #23

Uh oh!

myobie commented Nov 5, 2025 •

edited

Loading

Uh oh!

atdrendel commented Nov 5, 2025

Uh oh!

myobie commented Nov 5, 2025

Uh oh!

myobie commented Nov 5, 2025

Uh oh!

atdrendel left a comment

Uh oh!

atdrendel commented Nov 6, 2025

Uh oh!

myobie commented Nov 6, 2025 •

edited

Loading

Uh oh!

myobie commented Nov 6, 2025

Uh oh!

atdrendel commented Nov 6, 2025 •

edited

Loading

Uh oh!

myobie commented Nov 6, 2025

Uh oh!

myobie Nov 6, 2025

Uh oh!

myobie Nov 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add 4bit gpt oss 20b #23

Add 4bit gpt oss 20b #23

Uh oh!

Conversation

myobie commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

atdrendel commented Nov 5, 2025

Uh oh!

myobie commented Nov 5, 2025

Uh oh!

myobie commented Nov 5, 2025

Uh oh!

atdrendel left a comment

Choose a reason for hiding this comment

Uh oh!

atdrendel commented Nov 6, 2025

Uh oh!

myobie commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

myobie commented Nov 6, 2025

Uh oh!

atdrendel commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

myobie commented Nov 6, 2025

Uh oh!

myobie Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

myobie Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

myobie commented Nov 5, 2025 •

edited

Loading

myobie commented Nov 6, 2025 •

edited

Loading

atdrendel commented Nov 6, 2025 •

edited

Loading