Model that is fine-tuned in 4-bit precision using QLoRA on timdettmers/openassistant-guanaco and sharded to be used on a free Google Colab instance that can be loaded with 4bits.

It can be easily imported using the AutoModelForCausalLM class from transformers:

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
          "guardrail/llama-2-7b-guanaco-instruct-sharded",
          load_in_4bit=True)
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)

Downloads last month: 819

Safetensors

Model size

6.74B params

Tensor type

F32

Inference API

Text Generation

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for guardrail/llama-2-7b-guanaco-instruct-sharded

Adapters

1 model

Finetunes

2 models

guardrail
/

llama-2-7b-guanaco-instruct-sharded

Model tree for guardrail/llama-2-7b-guanaco-instruct-sharded

Dataset used to train guardrail/llama-2-7b-guanaco-instruct-sharded

Spaces using guardrail/llama-2-7b-guanaco-instruct-sharded 2