vLLM support, GGUF

#1
by dpkirchner - opened

According to the model card:

You can also run this model with vLLM, by running the following in your terminal after pip install vllm

vllm serve NousResearch/Hermes-3-Llama-3.1-70B

I assume this is just carried over from the base model README. How do you load, say, the Q4 M gguf with vllm and then use it on the chat completions endpoint?

Sign up or log in to comment