GGUF
Not-For-All-Audiences
nsfw
Inference Endpoints

Context

#1
by Sakedo - opened

Getting great results with the model until it passes 4096 tokens, and then it promptly devolves into nonsense.
I'm guessing the model doesn't actually have the 32k context of the original Mixtral, but the 4k context of the llama 2 models it was trained on, and that its metadata tags are wrong.

MergeFuel org

It indeed do 4k context, the model was made with experimental tools and config is subject to change later, you need to play with RoPE value to get a little bit more context, all my repo will be modified when we settle this.
Thank you for your feedback!

Thanks for this. With the right context settings, this dethrones Goliath and Venus as the best model for me, even ignoring the massive speed improvement.

Sakedo changed discussion status to closed

Did you find context settings that extend it past the base 4096?

Yeah, same as any llama2 model.
--rope-freq-base 10000 --rope-freq-scale 0.5

Sign up or log in to comment