Can you post the tokenizers?

#3
by jackboot - opened

I'd like to use this with llama.cpp HF and currently I cannot. I can manually switch the config to chatml but I have no idea if you assigned those tokens a particular value or if they're being split apart. They aren't very big.

Owner

I've uploaded the tokenizer.json

Thanks, there's also configs that go with it. I suppose that at least I can grind out the jsons this way.

heh, looking at your GGUF, it has incorrect metadata and still uses as eos token. end_of_turn is also set as the EOT.

Owner

You're right, this one's pretty broken. I've created a V2 here (including tokenizer and tokenizer_config:

https://huggingface.co/gghfez/gemma-2-27b-rp-c2-v2-GGUF

This one was trained with the "gemma2 chatml" template
{{ bos_token }}{% for message in mess...

Still has the issue with tags not tokenized.

Working for me in SillyTavern with the gemma2 and chatml

Sign up or log in to comment