-
Notifications
You must be signed in to change notification settings - Fork 0
Add 4bit gpt oss 20b #23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
* Add more append* methods to UserInput so we can build up an input of all types of messages including harmony messages * Increase prefillStepSize so we process the entire context in one chunk * Add logging for debugging, sorry
|
@myobie Were there any tests that didn't work before making these changes that did work after making them? |
|
@atdrendel I didn't run the tests locally, I wanted to let CI do it. I don't have the models in the right place. I also just wanted you to see this and see if it makes sense to you or not. |
|
@atdrendel sorry in case it wasn't clear, this is just part of what I did to get gpt oss working on my laptop. But I don't know why gpt oss didn't work already, and wanted you to see this and hopefully you'd understand why. |
atdrendel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@myobie It was a good idea to add additional logging.
|
@myobie Also, the
By the way, @myobie, the models tests don't run in CI because downloading gigantic AI models to the CI machine would be expensive and slow. All the tests run, but ones that require the presence of an AI model just get skipped. So, if you want to test gpt-oss or whatever, you need to run it locally. Just uncomment the |
|
@atdrendel I'll retest today. When I try to use gpt oss it just doesn't work most times. I tried everything and when I added this setting it did work twice, but if it's not used then it must have been random. I'm not worried about the tests, I'll try to run them today. I expect them to hang. I'm curious how you are able to run gpt oss at all. |
|
@atdrendel when you say the prefill setting isn't used anywhere, do you mean inside MLX or in our code? Evacuee here I've changed the default to a higher value already in the optional override. The normal default is much lower, I thought? |
Oh, I didn't realize the value you set it to was higher than the default. Either way, I kept the prefill size with the default value you set. So, it should work for you. 32 GB of RAM might not be enough to run gpt-oss at 8b quantization. On my Mac Mini, even though I have 64 GB of RAM, only 48 GB is usable by the GPU. So, I'm guessing your computer would only allow the GPU to use ~24 GB, which may be too little for gpt-oss. |
* The 4bit will produce duplicate channel tokens * The function calls are sometimes a little different These changes should support both the 4bit and 8bit versions.
|
@atdrendel I am going to pivot this PR to just adding the 4bit version of gpt oss and the few changes to help when it’s less precise with it’s output. I’ve removed the prefill stuff and I’m going to remove most of the logging. I just don’t think I’ll be able to test this model in either 8 or 4bit right now, I’ve tried changing things (some I do, some I don’t understand) and it never really works on my machine. Having the 4bit version here is helpful and I can run the tests for the 4bit version and they do pass 99% of the time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sometimes the 4bit gpt oss model will output multiple channel tokens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sometimes the tool call function names are not output 100% correct by the 4bit gpt oss model.
Uh oh!
There was an error while loading. Please reload this page.