Loading the model without any webUI

#5
by MrGobbs - opened

I wanted to use the model in a python code with pytorch. I did not want to use a web-ui just plain old command terminal. I wanted to know how I can do it.

Cognitive Computations org

Llama.cpp or python

Cognitive Computations org

I have a little guide for vicuna here you can do it with my models too, the ggml that TheBloke published.

https://erichartford.com/vicuna

Just import transformers and then get the model the settings you want, or just don't do sampling and then you're good to go.

Can I use this with python using llama ccp?
So would that be correct:
LLM = Llama(MODEL, verbose=False, n_ctx=2048)
and have MODEL replaced with the quantized bin file,
and n_ctx=161984 ? Or is there anything else that needs to be done?

Sign up or log in to comment