|
--- |
|
library_name: transformers |
|
license: mit |
|
datasets: |
|
- HuggingFaceH4/ultrachat_200k |
|
- Open-Orca/OpenOrca |
|
language: |
|
- en |
|
--- |
|
|
|
# Model Card for Model ID |
|
|
|
Phi-2-chat-v05 is a finetuned version of Phi-2 to increase the model's understanding of instructions and multi-turn conversations. |
|
In essence: it now has a concept of shutting up after an answer is given - as opposed to just switching into random generator mode. |
|
|
|
Finetuning used 25k records from the dataset `HuggingFaceH4/ultrachat_200k` |
|
|
|
|
|
# Prompt format |
|
|
|
``` |
|
<|system|> |
|
You are a helpful assistant.... |
|
<|user|> |
|
Why is the sky blue? |
|
<|assistant|> |
|
The sky appears blue because of a phenomenon called Rayleigh scattering. When sunlight enters the Earth's atmosphere [...] |
|
<|user|> |
|
Who was the phenomenon named after? |
|
<|assistant|> |
|
``` |
|
|
|
Or you can use the tokenizer's chat_template, as shown below. |
|
|
|
# How to use it? |
|
|
|
Dependencies |
|
``` |
|
pip install -u torch[cuda] transformers einops |
|
``` |
|
|
|
Cost for inference. |
|
|
|
|
|
```python |
|
import torch |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
|
model_id = "WeeRobots/phi-2-chat-v05" |
|
|
|
model = AutoModelForCausalLM.from_pretrained(model_id, device_map={"": 0}, trust_remote_code=True) |
|
tokenizer = tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True, trust_remote_code=True) |
|
|
|
|
|
|
|
payload = tokenizer.apply_chat_template([ |
|
{ 'role': 'system', 'content': '''You are a state machine. The user will add state slot values and I'll keep track of them.''' }, |
|
{ 'role': 'user', 'content': '''Place 15 into slot apple''' }, |
|
{ 'role': 'assistant', 'content': '''Roger that.''' }, |
|
{ 'role': 'user', 'content': '''Bananas slot should be 20''' }, |
|
{ 'role': 'assistant', 'content': '''Certainly''' }, |
|
{ 'role': 'user', 'content': '''What is value of Apples + Bananas?''' }, |
|
], tokenize=False, add_generation_prompt=True,) |
|
device = "cuda" |
|
model_input = tokenizer(payload, return_tensors="pt").to(device) |
|
with torch.no_grad(): |
|
# IMPORTANT: always set the eos_token_id in this call. the model is trained to emit the eos_token the right time |
|
# but it might continue generating irrelevant text. this way the model will stop at the right place |
|
model_response = model.generate(**model_input, max_new_tokens=512, eos_token_id=tokenizer.eos_token_id, ) |
|
print(tokenizer.decode(model_result[0], skip_special_tokens=False)) |
|
``` |
|
|
|
# Non production quality |
|
Be aware that this model tuning wasn't thoroughly tested, and isn't meant to be used in production, only for experimentation or hobby projects. |
|
|