File size: 2,485 Bytes
9cda10e
 
a11eb6d
 
 
 
 
 
9cda10e
 
 
 
6fcbe78
a11eb6d
9cda10e
a11eb6d
9cda10e
 
a11eb6d
9cda10e
a11eb6d
 
 
 
 
 
 
 
 
 
 
9cda10e
a11eb6d
9cda10e
a11eb6d
9cda10e
a11eb6d
 
 
 
9cda10e
a11eb6d
9cda10e
 
a11eb6d
 
9cda10e
cb64c73
 
a11eb6d
 
9cda10e
a11eb6d
 
 
 
 
 
 
 
9cda10e
a11eb6d
 
 
 
 
 
cb64c73
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
---
library_name: transformers
license: mit
datasets:
- HuggingFaceH4/ultrachat_200k
- Open-Orca/OpenOrca
language:
- en
---

# Model Card for Model ID

Phi-2-chat-v05 is a finetuned version of Phi-2 to increase the model's understanding of instructions and multi-turn conversations.
In essence: it now has a concept of shutting up after an answer is given - as opposed to just switching into random generator mode.

Finetuning used 25k records from the dataset `HuggingFaceH4/ultrachat_200k`


# Prompt format

```
<|system|>
You are a helpful assistant....
<|user|>
Why is the sky blue?
<|assistant|>
The sky appears blue because of a phenomenon called Rayleigh scattering. When sunlight enters the Earth's atmosphere [...]
<|user|>
Who was the phenomenon named after?
<|assistant|>
```

Or you can use the tokenizer's chat_template, as shown below.

# How to use it?

Dependencies
```
pip install -u torch[cuda] ransformers einops
```

Cost for inference.


```python
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "WeeRobots/phi-2-chat-v05"

model = AutoModelForCausalLM.from_pretrained(model_id, device_map={"": 0}, trust_remote_code=True)
tokenizer = tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True, trust_remote_code=True)

payload = tokenizer.apply_chat_template([
    { 'role': 'system', 'content': '''You are a state machine. The user will add state slot values and I'll keep track of them.''' },
    { 'role': 'user', 'content': '''Place 15 into slot apple''' },
    { 'role': 'assistant', 'content': '''Roger that.''' },
    { 'role': 'user', 'content': '''Bananas slot should be 20''' },
    { 'role': 'assistant', 'content': '''Certainly''' },
    { 'role': 'user', 'content': '''What is value of  Apples + Bananas?''' },
], tokenize=False, add_generation_prompt=True,)

model_input = tokenizer(payload, return_tensors="pt").to(device)
with torch.no_grad():
  # IMPORTANT: always set the eos_token_id in this call. the model is trained to emit the eos_token the right time
  # but it might continue generating irrelevant text. this way the model will stop at the right place
  model_response = model.generate(**model_input, max_new_tokens=512, eos_token_id=tokenizer.eos_token_id, )
  print(tokenizer.decode(model_result[0], skip_special_tokens=False))
```

# Non production quality
Be aware that this model tuning wasn't thoroughly tested, and isn't meant to be used in production, only for experimentation or hobby projects.