请教两个问题?

#3
by snuffcn - opened

一个是量化的GGUF模型中近期常见一种标名为i1的模型,是什么技术,有什么特点?
二是像Qwen/Qwen2.5-72B-Instruct-GGUF这样的模型由多个模型文件组成,如何使用ollama来创建。
Ollama的modelfile文件如何from这种多文件?我使用copy /B 把文件先连接起来变成单个大文件是可以创建成功的,但好像是不能用。
返回为EOF
谢谢

Qwen org
  1. Do you mean IQ1 ?

IQ-type models all have different implementation details. For simplicity, you can think like this: there is a fixed set of values and each value has an index; IQ-type models mainly stores the indices (and an offset in value) in an efficient way instead of the actual values. The mapping from indices to values are maintained in the source code.

Those types of quants are for extreme conditions and you should not use them normally. Almost all IQ quants in llama.cpp requires you to provide an important matrix for the quantization to work.

  1. readme/modelcard has instructions on how to merge them. READ THE MANUAL.

Sign up or log in to comment