--- base_model: unsloth/llama-3-8b-bnb-4bit language: - en license: apache-2.0 tags: - text-generation-inference - transformers - unsloth - llama - trl --- # Uploaded model - **Developed by:** harithapliyal - **License:** apache-2.0 - **Finetuned from model :** unsloth/llama-3-8b-bnb-4bit This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library. [](https://github.com/unslothai/unsloth) from google.colab import userdata HF_KEY = userdata.get('HF_KEY') from unsloth import FastLanguageModel import torch # Load model directly from transformers import AutoModelForCausalLM, BitsAndBytesConfig # Configure the quantization ``` bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_use_double_quant=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype="float16" ) ``` # Load the model with quantization ``` model1 = AutoModelForCausalLM.from_pretrained( "harithapliyal/llama-3-8b-bnb-4bit-finetuned-SentAnalysis", quantization_config=bnb_config ) FastLanguageModel.for_inference(model1) # Enable native 2x faster inference inputs = tokenizer( [ fine_tuned_prompt.format( "Classify the sentiment of the following text.", # instruction "I like play yoga under the rain", # input "", # output - leave this blank for generation! ) ], return_tensors = "pt").to("cuda") outputs = model.generate(**inputs, max_new_tokens = 64, use_cache = True) outputs = tokenizer.decode(outputs[0]) print(outputs) ```