Michael Goin
mgoin
AI & ML interests
LLM inference optimization, compression, quantization, pruning, distillation
Organizations
mgoin's activity
Oom with 24g vram
3
#1 opened 9 days ago
by
Klopez
latest vllm docker (v0.6.2) fail to load
2
#1 opened 5 days ago
by
choronz333
Issue with loading model
1
#1 opened about 1 month ago
by
xSumukhax
Can it run on A100/A800 with VLLM?
3
#1 opened 2 months ago
by
Parkerlambert123
weights does not exist when trying to deploy in sagemaker endpoint
1
#1 opened about 2 months ago
by
LorenzoCevolaniAXA
8-kv-heads
4
#17 opened 2 months ago
by
ArthurZ
8-kv-heads
3
#21 opened 2 months ago
by
ArthurZ
run with vllm
8
#4 opened 2 months ago
by
kuliev-vitaly
Not able to run Model using VLLM
1
#3 opened 2 months ago
by
Pchaudhary
getting issue while loading in llm
1
#1 opened 2 months ago
by
Abhinav6310
How to fast inference with FP8
1
#2 opened 2 months ago
by
CCRss
Unable to load model onto multiple GPUs
2
#2 opened 2 months ago
by
bprice9
What are the differences between yours and meta's offical one?
2
#2 opened 2 months ago
by
c6sneaky
OSError, is the config correct?
2
#1 opened 2 months ago
by
jackinthebox52
Thanks your great work!
2
#1 opened 2 months ago
by
bay-llm
Compression script limits context length to 4098?
1
#1 opened 2 months ago
by
Kayvane
Where is Minitron-4B-Instruct?
1
#2 opened 2 months ago
by
mgoin
Is this compatible with the KV_Cache_dtype being FP8?
2
#1 opened 2 months ago
by
nickandbro
Are these models limited to H100s?
7
#2 opened 2 months ago
by
RonanMcGovern
Replace kv_channels with head_dim
#1 opened 3 months ago
by
mgoin
Error serving model
3
#2 opened 3 months ago
by
EvGUT
How to load this model?
1
#1 opened 3 months ago
by
Frz614
How to run Meta-Llama-3-70B-Instruct-FP8 using several devices?
5
#3 opened 3 months ago
by
Fertel
Update model.safetensors.index.json
#2 opened 3 months ago
by
mgoin
Update model.safetensors.index.json
#4 opened 3 months ago
by
mgoin
`model.safetensors.index.json` still has the legacy name`act_scale` for activation scales
1
#3 opened 3 months ago
by
Alchan
Update README.md
#1 opened 3 months ago
by
alexmarques
Update README.md
#1 opened 3 months ago
by
alexmarques
Update README.md
#1 opened 4 months ago
by
abhinavnmagic
Update README.md
#1 opened 4 months ago
by
abhinavnmagic
Update README.md
#2 opened 4 months ago
by
abhinavnmagic
Create README.md
#1 opened 4 months ago
by
abhinavnmagic
Fails to run with nm-vllm
1
#1 opened 5 months ago
by
clintonruairi
Librarian Bot: Add language metadata for dataset
#2 opened 5 months ago
by
librarian-bot
Inference GPU Ram requirement >60GB
1
#1 opened 5 months ago
by
Ksgk-fy
What conversion process are you using?
2
#2 opened 5 months ago
by
matt-psaltis-devbricks
What is Marlin?
2
#1 opened 6 months ago
by
Samvanity
Inference Issues
7
#1 opened 6 months ago
by
qeternity
Update README.md
#2 opened 7 months ago
by
shubhrapandit
New activity in
neuralmagic/Llama-2-7b-dolphin-open_platypus-pruned_70-quantized-deepsparse
7 months ago
Update README.md
#1 opened 7 months ago
by
shubhrapandit
New activity in
neuralmagic/Llama-2-7b-dolphin-open_platypus-pruned_50-quantized-deepsparse
7 months ago
Update README.md
#1 opened 7 months ago
by
shubhrapandit
Update README.md
#1 opened 7 months ago
by
shubhrapandit
Update README.md
#1 opened 7 months ago
by
shubhrapandit
Update README.md
#1 opened 7 months ago
by
abhinavnmagic
Update README.md
#1 opened 7 months ago
by
abhinavnmagic
Update README.md
#1 opened 7 months ago
by
abhinavnmagic
Update README.md
#1 opened 7 months ago
by
abhinavnmagic
Update README.md
#1 opened 7 months ago
by
abhinavnmagic
Update README.md
#1 opened 7 months ago
by
alexmarques
Update README.md
#1 opened 7 months ago
by
alexmarques
Update README.md
#1 opened 7 months ago
by
alexmarques
Update README.md
#1 opened 7 months ago
by
alexmarques
Update README.md
#1 opened 7 months ago
by
alexmarques
Update README.md
#4 opened 9 months ago
by
chrisxx
Update README with model author names and speedup numbers.
#3 opened 9 months ago
by
jen
Update README.md
#1 opened 10 months ago
by
wendlerc
Adding `safetensors` variant of this model
#1 opened 10 months ago
by
mgoin
Create README.md
#2 opened about 1 year ago
by
mgoin