Inference freezes using the recommended VLLM approach
#5 opened 3 days ago
by
dhaneshsabane
[ERROR]: torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 22.88 GiB. GPU
2
#4 opened 11 days ago
by
Axinx
![](https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/caYESMVvopAj4nmiL0cWp.png)
requirement for gpu
#3 opened 16 days ago
by
aman213
The model hallucinates after the first response
3
#2 opened 17 days ago
by
LordFonDragon
![](https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/pnPojf8JvejWORX58yDT-.jpeg)