shisa-ai/shisa-v1-llama3-70b-gguf

See https://huggingface.co/shisa-ai/shisa-v1-llama3-70b for the original model.

I was seeing corruption issues at extended context length but this appears to be due to how llama.cpp's server behavior defaulting to a small context window.

See: https://github.com/ggerganov/llama.cpp/issues/7609

When using the server, you should explicitly set --ctx-size 0 or --ctx-size 8192 to support the native context size, eg:

./server -ngl 99 -m shisa-v1-llama3-70b.Q4_K_M.gguf --host 0.0.0.0 --port 8080 --chat-template llama3 --ctx-size 0

Model	Average	ELYZA-tasks-100	MT-Bench	Rakuda	Tengu-Bench
shisa-ai/shisa-v1-llama3-70b	7.30	7.34	7.67	8.15	6.04
shisa-ai/shisa-v1-llama3-70b.Q4_K_M	7.22	7.22	7.27	8.20	6.19

For additional quants, including lower-bit iMatrix quants, see: https://huggingface.co/mradermacher/shisa-v1-llama3-70b-GGUF

split big files:

split -b 40G -d --additional-suffix=.part shisa-v1-llama3-70b.bf16.gguf shisa-v1-llama3-70b.bf16.gguf

put it back together:

cat shisa-v1-llama3-70b.bf16.gguf*.part > shisa-v1-llama3-70b.bf16.gguf

ensure the order

cat $(ls -v shisa-v1-llama3-70b.bf16.gguf*.part) > shisa-v1-llama3-70b.bf16.gguf

Conversion script: https://github.com/shisa-ai/shisa-v2/blob/main/convert/gguf.sh

shisa-ai
/

shisa-v1-llama3-70b-gguf

Quantized from

Quantized from shisa-ai/shisa-v1-llama3-70b.2e5 shisa-ai/shisa-v1-llama3-70b

Quantized from