Skip to content

Conversation

@myobie
Copy link
Member

@myobie myobie commented Nov 5, 2025

  • Adds the 4bit version of gpt oss 20B
  • Attempts to cleanup the harmony tokens a bit, the 4bit version isn’t always as precise
  • And add some logs

* Add more append* methods to UserInput so we can build up an input of
  all types of messages including harmony messages
* Increase prefillStepSize so we process the entire context in one chunk
* Add logging for debugging, sorry
@myobie myobie requested a review from atdrendel November 5, 2025 11:48
@atdrendel
Copy link
Contributor

@myobie Were there any tests that didn't work before making these changes that did work after making them?

@myobie
Copy link
Member Author

myobie commented Nov 5, 2025

@atdrendel I didn't run the tests locally, I wanted to let CI do it. I don't have the models in the right place. I also just wanted you to see this and see if it makes sense to you or not.

@myobie
Copy link
Member Author

myobie commented Nov 5, 2025

@atdrendel sorry in case it wasn't clear, this is just part of what I did to get gpt oss working on my laptop. But I don't know why gpt oss didn't work already, and wanted you to see this and hopefully you'd understand why.

Copy link
Contributor

@atdrendel atdrendel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@myobie It was a good idea to add additional logging.

@atdrendel
Copy link
Contributor

@myobie Also, the prefillStepSize argument isn't used anywhere.

@atdrendel I didn't run the tests locally, I wanted to let CI do it. I don't have the models in the right place. I also just wanted you to see this and see if it makes sense to you or not.

By the way, @myobie, the models tests don't run in CI because downloading gigantic AI models to the CI machine would be expensive and slow. All the tests run, but ones that require the presence of an AI model just get skipped.

So, if you want to test gpt-oss or whatever, you need to run it locally. Just uncomment the resources: [] lines in Package.swift for which you've got a downloaded model that you want to test.

@myobie
Copy link
Member Author

myobie commented Nov 6, 2025

@atdrendel I'll retest today. When I try to use gpt oss it just doesn't work most times. I tried everything and when I added this setting it did work twice, but if it's not used then it must have been random.

I'm not worried about the tests, I'll try to run them today. I expect them to hang. I'm curious how you are able to run gpt oss at all.

@myobie
Copy link
Member Author

myobie commented Nov 6, 2025

@atdrendel when you say the prefill setting isn't used anywhere, do you mean inside MLX or in our code? Evacuee here I've changed the default to a higher value already in the optional override. The normal default is much lower, I thought?

@atdrendel
Copy link
Contributor

atdrendel commented Nov 6, 2025

@atdrendel when you say the prefill setting isn't used anywhere, do you mean inside MLX or in our code? Evacuee here I've changed the default to a higher value already in the optional override. The normal default is much lower, I thought?

Oh, I didn't realize the value you set it to was higher than the default. Either way, I kept the prefill size with the default value you set. So, it should work for you.

32 GB of RAM might not be enough to run gpt-oss at 8b quantization. On my Mac Mini, even though I have 64 GB of RAM, only 48 GB is usable by the GPU. So, I'm guessing your computer would only allow the GPU to use ~24 GB, which may be too little for gpt-oss.

myobie and others added 2 commits November 6, 2025 12:00
* The 4bit will produce duplicate channel tokens
* The function calls are sometimes a little different

These changes should support both the 4bit and 8bit versions.
@myobie myobie changed the title Add more append* methods to UserInput and increase prefillStepSize Attempt to cleanup harmony tokens when the 4bit version isn’t precise Nov 6, 2025
@myobie myobie changed the title Attempt to cleanup harmony tokens when the 4bit version isn’t precise Add 4bit gpt oss 20b Nov 6, 2025
@myobie
Copy link
Member Author

myobie commented Nov 6, 2025

@atdrendel I am going to pivot this PR to just adding the 4bit version of gpt oss and the few changes to help when it’s less precise with it’s output. I’ve removed the prefill stuff and I’m going to remove most of the logging. I just don’t think I’ll be able to test this model in either 8 or 4bit right now, I’ve tried changing things (some I do, some I don’t understand) and it never really works on my machine. Having the 4bit version here is helpful and I can run the tests for the 4bit version and they do pass 99% of the time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes the 4bit gpt oss model will output multiple channel tokens.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes the tool call function names are not output 100% correct by the 4bit gpt oss model.

@myobie myobie requested a review from atdrendel November 6, 2025 17:34
@atdrendel atdrendel merged commit 57c10e4 into main Nov 6, 2025
1 check passed
@atdrendel atdrendel deleted the harmonize branch November 6, 2025 20:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants