File size: 4,264 Bytes
cafc891
 
961aa6b
bc80658
961aa6b
 
 
 
 
 
 
cafc891
 
961aa6b
9282fcd
cafc891
961aa6b
45fb3d9
cafc891
961aa6b
 
cafc891
961aa6b
 
 
 
 
 
cafc891
45fb3d9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
69307c5
45fb3d9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
87b4543
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
---
library_name: transformers
license: mit
base_model: microsoft/phi-1_5
datasets:
- teknium/OpenHermes-2.5
- HuggingFaceH4/ultrafeedback_binarized
- argilla/distilabel-intel-orca-dpo-pairs
- jondurbin/py-dpo-v0.1
- argilla/distilabel-math-preference-dpo
pipeline_tag: text-generation
---

# Phi-1.5
The language model [phi-1.5](https://huggingface.co/microsoft/phi-1_5) is a Transformer with **1.3 billion** parameters. It was trained using the same data sources as [phi-1](https://huggingface.co/microsoft/phi-1), augmented with a new data source that consists of various NLP synthetic texts. When assessed against benchmarks testing common sense, language understanding, and logical reasoning, phi-1.5 demonstrates a nearly state-of-the-art performance among models with less than 10 billion parameters.

# Phi-1_5-Instruct-v0.1
The model has underwent a post-training process that incorporates both **supervised fine-tuning** and **direct preference optimization** for instruction following. I used the [trl](https://huggingface.co/docs/trl/en/index) library and a single **A100 40GB** GPU during both the SFT and DPO steps.

- Supervised Fine-Tuning
  - Used 128,000 instruction, response pairs from the [teknium/OpenHermes-2.5](https://huggingface.co/datasets/teknium/OpenHermes-2.5) dataset

- Direct Preference Optimization (DPO)
  - Used a combination of the following preference datasets
    - [HuggingFaceH4/ultrafeedback_binarized](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized)
    - [argilla/distilabel-intel-orca-dpo-pairs](https://huggingface.co/datasets/argilla/distilabel-intel-orca-dpo-pairs)
    - [argilla/distilabel-math-preference-dpo](https://huggingface.co/datasets/argilla/distilabel-math-preference-dpo)
    - [jondurbin/py-dpo-v0.1](https://huggingface.co/datasets/argilla/distilabel-math-preference-dpo)

## How to use
### Chat Format

Given the nature of the training data, the Phi-1.5 Instruct model is best suited for prompts using the chat format as follows. 
You can provide the prompt as a question with a generic template as follow:
```markdown
<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
Question?<|im_end|>
<|im_start|>assistant
```

For example:
```markdown
<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
How to explain Internet for a medieval knight?<|im_end|>
<|im_start|>assistant
```
where the model generates the text after `<|im_start|>assistant` .

### Sample inference code

This code snippets show how to get quickly started with running the model on a GPU:

```python
import torch 
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline 

torch.random.manual_seed(0) 

model_id = "rasyosef/Phi-1_5-Instruct-v0.1"
model = AutoModelForCausalLM.from_pretrained( 
    model_id,  
    device_map="cuda",  
    torch_dtype="auto" 
) 

tokenizer = AutoTokenizer.from_pretrained(model_id) 

messages = [ 
    {"role": "system", "content": "You are a helpful AI assistant."}, 
    {"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"}, 
    {"role": "assistant", "content": "Sure! Here are some ways to eat bananas and dragonfruits together: 1. Banana and dragonfruit smoothie: Blend bananas and dragonfruits together with some milk and honey. 2. Banana and dragonfruit salad: Mix sliced bananas and dragonfruits together with some lemon juice and honey."}, 
    {"role": "user", "content": "What about solving an 2x + 3 = 7 equation?"}, 
] 

pipe = pipeline( 
    "text-generation", 
    model=model, 
    tokenizer=tokenizer, 
) 

generation_args = { 
    "max_new_tokens": 500, 
    "return_full_text": False, 
    "temperature": 0.0, 
    "do_sample": False, 
} 

output = pipe(messages, **generation_args) 
print(output[0]['generated_text'])  
```

Note: If you want to use flash attention, call _AutoModelForCausalLM.from_pretrained()_ with _attn_implementation="flash_attention_2"_

## Benchmarks

|Model|Size (# params)|IFEval|GSM8K|
|:----|:--------------|:-----|:----|
|rasyosef/Phi-1_5-Instruct-v0.1|1.4B|**26.71**|**41.78**|
|HuggingFaceTB/SmolLM-1.7B-Instruct|1.7B|24.21|3.45|
|TinyLlama/TinyLlama-1.1B-Chat-v1.0|1.1B|21.23|0|
|microsoft/phi-1_5|1.4B|20.51|31.73|