harithapliyal commited on
Commit
c811d3e
1 Parent(s): 5180e19

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +52 -0
README.md CHANGED
@@ -20,3 +20,55 @@ tags:
20
  This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
21
 
22
  [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
  This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
21
 
22
  [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
23
+
24
+ from google.colab import userdata
25
+ HF_KEY = userdata.get('HF_KEY')
26
+
27
+ from unsloth import FastLanguageModel
28
+ import torch
29
+
30
+ <!-- from transformers import TrainingArguments
31
+ from trl import SFTTrainer
32
+ from unsloth import is_bfloat16_supported
33
+
34
+ !pip uninstall -y xformers
35
+ !pip install xformers
36
+
37
+ !python -m xformers.info
38
+
39
+ !pip install triton -->
40
+
41
+ # Load model directly
42
+ from transformers import AutoModelForCausalLM, BitsAndBytesConfig
43
+
44
+ # Configure the quantization
45
+ bnb_config = BitsAndBytesConfig(
46
+ load_in_4bit=True,
47
+ bnb_4bit_use_double_quant=True,
48
+ bnb_4bit_quant_type="nf4",
49
+ bnb_4bit_compute_dtype="float16"
50
+ )
51
+
52
+ # Load the model with quantization
53
+ model1 = AutoModelForCausalLM.from_pretrained(
54
+ "harithapliyal/llama-3-8b-bnb-4bit-finetuned-SentAnalysis",
55
+ quantization_config=bnb_config
56
+ )
57
+
58
+
59
+
60
+ FastLanguageModel.for_inference(model1) # Enable native 2x faster inference
61
+ inputs = tokenizer(
62
+ [
63
+ fine_tuned_prompt.format(
64
+ "Classify the sentiment of the following text.", # instruction
65
+ "I like play yoga under the rain", # input
66
+ "", # output - leave this blank for generation!
67
+ )
68
+ ], return_tensors = "pt").to("cuda")
69
+
70
+ outputs = model.generate(**inputs, max_new_tokens = 64, use_cache = True)
71
+ outputs = tokenizer.decode(outputs[0])
72
+ print(outputs)
73
+
74
+