Michael Goin

#1 opened 9 days ago by

Klopez

New activity in neuralmagic/Phi-3.5-mini-instruct-FP8-KV 5 days ago

latest vllm docker (v0.6.2) fail to load

#1 opened 5 days ago by

choronz333

New activity in neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w4a16 about 1 month ago

Issue with loading model

#1 opened about 1 month ago by

xSumukhax

New activity in neuralmagic/DeepSeek-Coder-V2-Instruct-FP8 about 1 month ago

Can it run on A100/A800 with VLLM?

#1 opened 2 months ago by

Parkerlambert123

New activity in neuralmagic/Meta-Llama-3.1-405B-Instruct-quantized.w4a16 about 2 months ago

weights does not exist when trying to deploy in sagemaker endpoint

#1 opened about 2 months ago by

LorenzoCevolaniAXA

New activity in meta-llama/Llama-3.1-405B-Instruct about 2 months ago

8-kv-heads

4

#17 opened 2 months ago by

ArthurZ

New activity in meta-llama/Llama-3.1-405B about 2 months ago

8-kv-heads

#21 opened 2 months ago by

ArthurZ

New activity in neuralmagic/Meta-Llama-3.1-70B-Instruct-quantized.w8a16 2 months ago

run with vllm

8

#4 opened 2 months ago by

kuliev-vitaly

New activity in neuralmagic/Meta-Llama-3.1-8B-Instruct-FP8 2 months ago

Not able to run Model using VLLM

#3 opened 2 months ago by

Pchaudhary

New activity in neuralmagic/gemma-2-9b-it-FP8 2 months ago

getting issue while loading in llm

#1 opened 2 months ago by

Abhinav6310

New activity in neuralmagic/Meta-Llama-3.1-8B-Instruct-FP8 2 months ago

How to fast inference with FP8

#2 opened 2 months ago by

CCRss

New activity in neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w8a16 2 months ago

Unable to load model onto multiple GPUs

#2 opened 2 months ago by

bprice9

New activity in neuralmagic/Meta-Llama-3.1-405B-Instruct-FP8 2 months ago

What are the differences between yours and meta's offical one?

#2 opened 2 months ago by

c6sneaky

New activity in neuralmagic/Meta-Llama-3.1-8B-Instruct-FP8 2 months ago

OSError, is the config correct?

#1 opened 2 months ago by

jackinthebox52

New activity in mgoin/Nemotron-4-340B-Instruct-hf-FP8 2 months ago

Thanks your great work!

#1 opened 2 months ago by

bay-llm

New activity in neuralmagic/Mistral-7B-Instruct-v0.3-FP8 2 months ago

Compression script limits context length to 4098?

#1 opened 2 months ago by

Kayvane

New activity in nvidia/Minitron-4B-Base 2 months ago

Where is Minitron-4B-Instruct?

#2 opened 2 months ago by

mgoin

New activity in neuralmagic/Meta-Llama-3.1-405B-Instruct-FP8 2 months ago

Is this compatible with the KV_Cache_dtype being FP8?

#1 opened 2 months ago by

nickandbro

New activity in neuralmagic/Meta-Llama-3.1-70B-Instruct-FP8 2 months ago

Are these models limited to H100s?

7

#2 opened 2 months ago by

RonanMcGovern

New activity in nvidia/Minitron-8B-Base 3 months ago

Replace kv_channels with head_dim

#1 opened 3 months ago by

mgoin

New activity in neuralmagic/Mistral-Nemo-Instruct-2407-FP8 3 months ago

Error serving model

#2 opened 3 months ago by

EvGUT

New activity in neuralmagic/Phi-3-mini-128k-instruct-quantized.w8a8 3 months ago

Model doesnt load in vllm version 0.5.0 : ValueError: `rope_scaling`'s type field must be one of ['su', 'yarn'], got longrope

#1 opened 3 months ago by

sujjosep

New activity in neuralmagic/Meta-Llama-3-8B-Instruct-FP8-KV 3 months ago

How to load this model?

#1 opened 3 months ago by

Frz614

New activity in neuralmagic/Meta-Llama-3-70B-Instruct-FP8 3 months ago

How to run Meta-Llama-3-70B-Instruct-FP8 using several devices?

5

#3 opened 3 months ago by

Fertel

Update model.safetensors.index.json

#2 opened 3 months ago by

mgoin

New activity in neuralmagic/Meta-Llama-3-8B-Instruct-FP8 3 months ago

Update model.safetensors.index.json

#4 opened 3 months ago by

mgoin

`model.safetensors.index.json` still has the legacy name`act_scale` for activation scales

#3 opened 3 months ago by

Alchan

New activity in nm-testing/SparseLlama-3-8B-pruned_50.2of4-FP8 3 months ago

Update README.md

#1 opened 3 months ago by

New activity in neuralmagic/SparseLlama-3-8B-pruned_50.2of4 3 months ago

Update README.md

#1 opened 3 months ago by

New activity in neuralmagic/Qwen2-72B-Instruct-FP8 4 months ago

Update README.md

#1 opened 4 months ago by

New activity in neuralmagic/Mixtral-8x7B-Instruct-v0.1-FP8 4 months ago

Update README.md

#1 opened 4 months ago by

New activity in neuralmagic/Meta-Llama-3-8B-Instruct-FP8 4 months ago

Update README.md

#2 opened 4 months ago by

New activity in neuralmagic/Meta-Llama-3-70B-Instruct-FP8 4 months ago

Create README.md

#1 opened 4 months ago by

New activity in neuralmagic/Meta-Llama-3-8B-Instruct-FP8 4 months ago

Fails to run with nm-vllm

#1 opened 5 months ago by

clintonruairi

New activity in mgoin/ultrachat_2k 5 months ago

Librarian Bot: Add language metadata for dataset

#2 opened 5 months ago by

librarian-bot

New activity in neuralmagic/Mistral-7B-Instruct-v0.3-GPTQ-4bit 5 months ago

Inference GPU Ram requirement >60GB

#1 opened 5 months ago by

Ksgk-fy

New activity in mgoin/Meta-Llama-3-70B-Instruct-Marlin 5 months ago

What conversion process are you using?

#2 opened 5 months ago by

matt-psaltis-devbricks

New activity in mgoin/Meta-Llama-3-70B-Instruct-Marlin 6 months ago

What is Marlin?

#1 opened 6 months ago by

Samvanity

New activity in mgoin/Meta-Llama-3-8B-Instruct-Marlin 6 months ago

Inference Issues

7

#1 opened 6 months ago by

qeternity

New activity in neuralmagic/Llama-2-7b-dolphin-open_platypus-pruned_50 7 months ago

Update README.md

#2 opened 7 months ago by

New activity in neuralmagic/Llama-2-7b-dolphin-open_platypus-pruned_70-quantized-deepsparse 7 months ago

Update README.md

#1 opened 7 months ago by

New activity in neuralmagic/Llama-2-7b-dolphin-open_platypus-pruned_50-quantized-deepsparse 7 months ago

Update README.md

#1 opened 7 months ago by

New activity in neuralmagic/Llama-2-7b-dolphin-open_platypus-pruned_70 7 months ago

Update README.md

#1 opened 7 months ago by

New activity in neuralmagic/Llama-2-7b-dolphin-open_platypus-pruned_50 7 months ago

Update README.md

#1 opened 7 months ago by

New activity in neuralmagic/Llama-2-7b-evolcodealpaca 7 months ago

Update README.md

#1 opened 7 months ago by

New activity in neuralmagic/Llama-2-7b-evol-code-alpaca-pruned_70-quantized-deepsparse 7 months ago

Update README.md

#1 opened 7 months ago by

New activity in neuralmagic/Llama-2-7b-evol-code-alpaca-pruned_50-quantized-deepsparse 7 months ago

Update README.md

#1 opened 7 months ago by

New activity in neuralmagic/Llama-2-7b-evol-code-alpaca-pruned_70 7 months ago

Update README.md

#1 opened 7 months ago by

New activity in neuralmagic/Llama-2-7b-evol-code-alpaca-pruned_50 7 months ago

Update README.md

#1 opened 7 months ago by

New activity in neuralmagic/Llama-2-7b-ultrachat200k-pruned_70-quantized-deepsparse 7 months ago

Update README.md

#1 opened 7 months ago by

New activity in neuralmagic/Llama-2-7b-ultrachat200k-pruned_70 7 months ago

Update README.md

#1 opened 7 months ago by

New activity in neuralmagic/Llama-2-7b-ultrachat200k-pruned_50 7 months ago

Update README.md

#1 opened 7 months ago by

New activity in neuralmagic/Llama-2-7b-pruned70-retrained 7 months ago

Update README.md

#1 opened 7 months ago by

New activity in neuralmagic/Llama-2-7b-pruned50-retrained 7 months ago

Update README.md

#1 opened 7 months ago by