How to inference the model?
#3
by
frankgu3528
- opened
Can we directly using vLLM to do the inference for this model?
Hi!
Our model uses exactly the same architecture as Llama-3 so technically you should be able to use vLLM just like Llama-3 (though we haven't tested it and not sure if vLLM will affect the precision in long-context applications).