Exl2 quants

#1
by bullerwins - opened

Hi!

I just made a 4.0, 5.0 and 6.0bpw quant versions of this model to load with exllamav2, if anyone is interested:

https://huggingface.co/bullerwins/Hermes-2-Theta-Llama-3-70B-exl2_4.0bpw
https://huggingface.co/bullerwins/Hermes-2-Theta-Llama-3-70B-exl2_5.0bpw
https://huggingface.co/bullerwins/Hermes-2-Theta-Llama-3-70B-exl2_6.0bpw

Hey do you know if I can use these quants with the config of this to expand the context?
https://huggingface.co/OpenPipe/Hermes-2-Theta-Llama-3-70B-32k
Or may I ask if you could quantize this repo? :)
The original weights are a bit too heavy for my slow internet

OpenPipe/Hermes-2-Theta-Llama-3-70B-32k

Are you sure @ztsvvstz it is a legit fine tune for a scaled context? The model card is just plain copy-paste without explaining anything in the sense of context upgrading.

OpenPipe/Hermes-2-Theta-Llama-3-70B-32k

Are you sure @ztsvvstz it is a legit fine tune for a scaled context? The model card is just plain copy-paste without explaining anything in the sense of context upgrading.

Honestly, I think they pretty much just changed some values within the config files and nothing else
As the model card is also just a plain copy I suggest this is just for ease of use of context expanding parameters within the configs (I for example really dont know what parameters I'd have to change).
Obv. thats not a fine tune, personally, I think the model performs quite well for a context up to 20k (speaking of the 7b variant, havent tested the 70b that much) so I find it rather comfortable to have it as extra repo like that.
I've also quantized the 70b 32k repo to AWQ 4 bit myself now and might upload it if anyone is interested (if its ok with the original authors ofc, I dont think that a reupload with a c/p model card would be a problem as all original credits are still present in there?).
So yea if anyone needs the AWQ files of that 32k repo tell me.

Honestly, big props to the authors for that 7b and 70b models!
They perform really quite well with some reasoning.

small edit: Im not affilliated with the people uploading that 32k repo, really dont know what they did but it seems to work well:)

Hi!

I just made a 4.0, 5.0 and 6.0bpw quant versions of this model to load with exllamav2, if anyone is interested:

https://huggingface.co/bullerwins/Hermes-2-Theta-Llama-3-70B-exl2_4.0bpw
https://huggingface.co/bullerwins/Hermes-2-Theta-Llama-3-70B-exl2_5.0bpw
https://huggingface.co/bullerwins/Hermes-2-Theta-Llama-3-70B-exl2_6.0bpw

Many thanks for 4.0bpw quant. I really hope I can have 3.5bpw quant to deliver more token/sec.

Sign up or log in to comment