DipeshChaudhary commited on
Commit
1e5e798
1 Parent(s): 88b16e1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -3
README.md CHANGED
@@ -38,9 +38,9 @@ base_model: unsloth/llama-3-8b-Instruct-bnb-4bit
38
  load_in_4bit=load_in_4bit,
39
  )
40
  ```
41
- ```
42
- #We now use the Llama-3 format for conversation style finetunes. We use Open Assistant conversations in ShareGPT style.
43
- **We use our get_chat_template function to get the correct chat template. They support zephyr, chatml, mistral, llama, alpaca, vicuna, vicuna_old and their own optimized unsloth template**
44
  from unsloth.chat_templates import get_chat_template
45
  tokenizer = get_chat_template(
46
  tokenizer,
@@ -48,6 +48,24 @@ base_model: unsloth/llama-3-8b-Instruct-bnb-4bit
48
  mapping = {"role" : "from", "content" : "value", "user" : "human", "assistant" : "gpt"}, # ShareGPT style
49
  )
50
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
51
 
52
  # Uploaded model
53
 
 
38
  load_in_4bit=load_in_4bit,
39
  )
40
  ```
41
+ # We now use the Llama-3 format for conversation style finetunes. We use Open Assistant conversations in ShareGPT style.
42
+ **We use our get_chat_template function to get the correct chat template. They support zephyr, chatml, mistral, llama, alpaca, vicuna, vicuna_old and their own optimized unsloth template**
43
+ ```
44
  from unsloth.chat_templates import get_chat_template
45
  tokenizer = get_chat_template(
46
  tokenizer,
 
48
  mapping = {"role" : "from", "content" : "value", "user" : "human", "assistant" : "gpt"}, # ShareGPT style
49
  )
50
  ```
51
+ ## FOR ACTUAL INFERENCE
52
+ ```
53
+ FastLanguageModel.for_inference(model) # Enable native 2x faster inference
54
+
55
+ messages = [
56
+ {"from": "human", "value": "I'm worry about my exam."},
57
+ ]
58
+ inputs = tokenizer.apply_chat_template(
59
+ messages,
60
+ tokenize = True,
61
+ add_generation_prompt = True, # Must add for generation
62
+ return_tensors = "pt",
63
+ ).to("cuda")
64
+
65
+ from transformers import TextStreamer
66
+ text_streamer = TextStreamer(tokenizer)
67
+ x= model.generate(input_ids = inputs, streamer = text_streamer, max_new_tokens = 128, use_cache = True)
68
+ ```
69
 
70
  # Uploaded model
71