Upload Un-quanted model

#1
by deleted - opened
deleted

Hey. This looks cool but i wanna quant it so i can fit on my smaller GPU @Q3_K_M , can you please upload the un-quanted version? Thank you!

Owner

Sure, I'll upload it tonight (need my bandwidth for now). I've added a Q3_K_M for you though for now.

LMK what you think of the model.

Is the tokenizer the same... I am using the HF samplers. Might cause a problem with the chat format if the tokens are different.

yea.. using the same configs from gemma doesn't work. it will output chatml into the chat because the IDs are different. so at least upload the jsons. Using pure llama.cpp loader in TGW it's more broken. The replies are good though.

also: https://is2.4chan.org/g/1720420814460744.png

If it’s not too difficult, you can also post a version like a (i1) Q4_K_S. Just something a bit smaller thetn 4qm (Doesn't fit in 24 vram with 16k context or even need something like IQ4_XS.)

Sign up or log in to comment