File size: 2,268 Bytes
9cda10e a11eb6d 9cda10e a11eb6d 9cda10e a11eb6d 9cda10e a11eb6d 9cda10e a11eb6d 9cda10e a11eb6d 9cda10e a11eb6d 9cda10e a11eb6d 9cda10e a11eb6d 9cda10e a11eb6d 9cda10e a11eb6d 9cda10e a11eb6d 9cda10e a11eb6d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 |
---
library_name: transformers
license: mit
datasets:
- HuggingFaceH4/ultrachat_200k
- Open-Orca/OpenOrca
language:
- en
---
# Model Card for Model ID
Phi-2-chat-v05 is a finetuned version of Phi-2 to have a better understanding of instructions and multi-turn conversations.
In essence: it now has a concept of shutting up after an answer is given - as opposed to just switching into random generator mode.
Finetuning used 25k records from the dataset `HuggingFaceH4/ultrachat_200k`
# Prompt format
```
<|system|>
You are a helpful assistant....
<|user|>
Why is the sky blue?
<|assistant|>
The sky appears blue because of a phenomenon called Rayleigh scattering. When sunlight enters the Earth's atmosphere [...]
<|user|>
Who was the phenomenon named after?
<|assistant|>
```
Or you can use the tokenizer's chat_template, as shown below.
# How to use it?
Dependencies
```
pip install -u torch[cuda] ransformers einops
```
Cost for inference.
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(model_id, device_map={"": 0}, trust_remote_code=True)
tokenizer = tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True, trust_remote_code=True)
payload = tokenizer.apply_chat_template([
{ 'role': 'system', 'content': '''You are a state machine. The user will add state slot values and I'll keep track of them.''' },
{ 'role': 'user', 'content': '''Place 15 into slot apple''' },
{ 'role': 'assistant', 'content': '''Roger that.''' },
{ 'role': 'user', 'content': '''Bananas slot should be 20''' },
{ 'role': 'assistant', 'content': '''Certainly''' },
{ 'role': 'user', 'content': '''What is value of Apples + Bananas?''' },
], tokenize=False, add_generation_prompt=True,)
model_input = tokenizer(payload, return_tensors="pt").to(device)
with torch.no_grad():
# IMPORTANT: always set the eos_token_id in this call. the model is trained to emit the eos_token the right time
# but it might continue generating irrelevant text. this way the model will stop at the right place
model_response = model.generate(**model_input, max_new_tokens=512, eos_token_id=tokenizer.eos_token_id, )
print(tokenizer.decode(model_result[0], skip_special_tokens=False))
``` |