Silicon Macs support.

#66
by quantoser - opened

Does this model work on Macs with Silicon chips? I'm running it on a Mac Pro M1 and it gets stuck with:

UserWarning: Using the model-agnostic default max_length (=20) to control the generation length. We recommend setting max_new_tokens to control the maximum length of the generation.
warnings.warn(

The process just sits there eating up CPU and memory, but no output ever produced.

Hi @quantoser !
In which precision are you running the generation? the 7B model will need ~30GB RAM just to be loaded on the CPU in float32, can you perhaps try to load the model in bfloat16?

yes it does!
For example you can use Gemma.cpp (https://github.com/google/gemma.cpp) or Ollama and both run on Mac.

Sign up or log in to comment