Please, add GGUF version!

#2
by Anderson452 - opened
Microsoft org

There is currently work happening on the llama.cpp side to actively support this (for example, this and this).

Specifically for this model, adding LongRoPE support for the 128k context length and the heterogeneous block-sparsity attention makes it a bit tricky, but hopefully this should be there soon :)

bapatra changed discussion status to closed

Sign up or log in to comment