Reconvert GGUF for the MoE, due to llama.cpp update

#1
by CombinHorizon - opened

would you please re-convert the GGUF using a newer version (newer than 2024-04apr-03) of llama.cpp for better performance?

see
https://github.com/ggerganov/llama.cpp/#hot-topics
MoE memory layout has been updated - reconvert models for mmap support and regenerate imatrix

https://github.com/ggerganov/llama.cpp/pull/6387

thx

I found the solution for everyone own this gguf file:
./quantize --allow-requantize can convert the old format to new format.

due to internet traffic limit, I cannot upload the new gguf, sorry for that.

Sign up or log in to comment