Upload Un-quanted model

by deleted - opened Jul 8

deleted

Jul 8

Hey. This looks cool but i wanna quant it so i can fit on my smaller GPU @Q3_K_M , can you please upload the un-quanted version? Thank you!

gghfez

Owner Jul 8

Sure, I'll upload it tonight (need my bandwidth for now). I've added a Q3_K_M for you though for now.

LMK what you think of the model.

jackboot

Jul 8

•

edited Jul 8

Is the tokenizer the same... I am using the HF samplers. Might cause a problem with the chat format if the tokens are different.

yea.. using the same configs from gemma doesn't work. it will output chatml into the chat because the IDs are different. so at least upload the jsons. Using pure llama.cpp loader in TGW it's more broken. The replies are good though.

also: https://is2.4chan.org/g/1720420814460744.png

Slaaaaaau

Jul 14

•

edited Jul 14

If it’s not too difficult, you can also post a version like a (i1) Q4_K_S. Just something a bit smaller thetn 4qm (Doesn't fit in 24 vram with 16k context or even need something like IQ4_XS.)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment