性能不稳定.

#6
by william0014 - opened

我使用相同的prompt, temperature 设置为0, 为什么模型回复都不一样? 有时候连内容含义都很较大? 需要怎么设置才能稳定? 推理框架使用的是VLLM, 我使用LLama3.1- chinese 8B 试过, 回复非常稳定.

Qwen org

Hi, please refer to vllm's documentation on this matter: https://docs.vllm.ai/en/stable/serving/faq.html
In addition to that, IIRC, the GPTQ kernel implementation in vllm is not deterministic which can also contributes to output variations.

jklj077 changed discussion status to closed

Sign up or log in to comment