File size: 2,643 Bytes
9cda10e a11eb6d 9cda10e 6fcbe78 a11eb6d 9cda10e a11eb6d 9cda10e a11eb6d 9cda10e a11eb6d 9cda10e 96477ec a11eb6d 9cda10e a11eb6d 9cda10e a11eb6d b0ae2fb a11eb6d 9cda10e a11eb6d 9cda10e a11eb6d 266406b a11eb6d 9cda10e cb64c73 a11eb6d 1265ee3 266406b a11eb6d 9a9f518 a11eb6d da1156e a11eb6d 266406b a11eb6d cb64c73 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 |
---
library_name: transformers
license: mit
datasets:
- HuggingFaceH4/ultrachat_200k
- Open-Orca/OpenOrca
language:
- en
---
# Model Card for Model ID
Phi-2-chat-v05 is a finetuned version of Phi-2 to increase the model's understanding of instructions and multi-turn conversations.
In essence: it now has a concept of shutting up after an answer is given - as opposed to just switching into random generator mode.
Finetuning used 25k records from the dataset `HuggingFaceH4/ultrachat_200k`
# Prompt format
```
<|system|>
You are a helpful assistant....
<|user|>
Why is the sky blue?
<|assistant|>
The sky appears blue because of a phenomenon called Rayleigh scattering. When sunlight enters the Earth's atmosphere [...]
<|user|>
Who was the phenomenon named after?
<|assistant|>
```
The system generates its output after the special token <|assistant|>. You need to have that token in the input for a reliable response.
Or you can use the tokenizer's chat_template, as shown below.
# How to use it?
Dependencies
```
pip install -u torch[cuda] transformers einops
```
Cost for inference.
```python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "WeeRobots/phi-2-chat-v05"
model = AutoModelForCausalLM.from_pretrained(model_id, device_map={"": 0}, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=False, trust_remote_code=True)
payload = tokenizer.apply_chat_template([
{ 'role': 'system', 'content': '''You are a state machine. The user will add state slot values and you'll keep track of them.''' },
{ 'role': 'user', 'content': '''Place 15 into slot apple''' },
{ 'role': 'assistant', 'content': '''Roger that.''' },
{ 'role': 'user', 'content': '''Bananas slot should be 20''' },
{ 'role': 'assistant', 'content': '''Certainly''' },
{ 'role': 'user', 'content': '''What is the value of Apple + Banana?''' },
], tokenize=False, add_generation_prompt=True,)
device = "cuda"
model_input = tokenizer(payload, return_tensors="pt").to(device)
with torch.no_grad():
# IMPORTANT: always set the eos_token_id in this call. the model is trained to emit the eos_token the right time
# but it might continue generating irrelevant text. this way the model will stop at the right place
model_response = model.generate(**model_input, max_new_tokens=512, eos_token_id=tokenizer.eos_token_id, )
print(tokenizer.decode(model_result[0], skip_special_tokens=False))
```
# Non production quality
Be aware that this model tuning wasn't thoroughly tested, and isn't meant to be used in production, only for experimentation or hobby projects.
|