Vllm

by dbasu - opened 1 day ago

Discussion

dbasu

1 day ago

What is the right way to use this model with vllm?

zhilinw

NVIDIA org about 3 hours ago

We don't have an implementation compatible with vLLM yet. That being said, for this model, it only requires the equivalence of 1 generated token per conversation so it should be feasible to just run it with the HF approach mentioned in the readme.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment