Oom with 24g vram

by Klopez - opened 9 days ago

9 days ago

•

Anyone else experiencing this? I have a 3090 24gb vram and i tried loading this via vllm and got oom even with max model size 1000. Is it possible to do int8 rather than fp8?

mgoin

Neural Magic org 9 days ago

Try also setting --max-num-seqs=1. Unfortunately the kv cache required to run this model is very large at the moment due to how vision models are profiled

Klopez

9 days ago

Thank you for that. Seems like it helped, but wow didnt expect that to happen with such a small model. Could you link me where I can read more on this?

mgoin

Neural Magic org 9 days ago

We have an issue tracker here https://github.com/vllm-project/vllm/issues/8826 so maybe you could leave your experience?

mgoin changed discussion status to closed 4 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment