Vllm

#2
by dbasu - opened

What is the right way to use this model with vllm?

We don't have an implementation compatible with vLLM yet. That being said, for this model, it only requires the equivalence of 1 generated token per conversation so it should be feasible to just run it with the HF approach mentioned in the readme.

Sign up or log in to comment