rasyosef
/

Phi-1_5-Instruct-v0.1

@@ -14,7 +14,7 @@ pipeline_tag: text-generation
 The language model Phi-1.5 is a Transformer with **1.3 billion** parameters. It was trained using the same data sources as [phi-1](https://huggingface.co/microsoft/phi-1), augmented with a new data source that consists of various NLP synthetic texts. When assessed against benchmarks testing common sense, language understanding, and logical reasoning, Phi-1.5 demonstrates a nearly state-of-the-art performance among models with less than 10 billion parameters.
 # Phi-1_5-Instruct-v0.1
-The model has underwent a post-training process that incorporates both supervised fine-tuning and direct preference optimization for instruction following. I used the [trl](https://huggingface.co/docs/trl/en/index) library and a single **A100 40GB** GPU during both the SFT and DPO steps.
 - Supervised Fine-Tuning
   - Used 128,000 instruction, response pairs from the [teknium/OpenHermes-2.5](https://huggingface.co/datasets/teknium/OpenHermes-2.5) dataset
@@ -26,3 +26,70 @@ The model has underwent a post-training process that incorporates both supervise
     - [argilla/distilabel-math-preference-dpo](https://huggingface.co/datasets/argilla/distilabel-math-preference-dpo)
     - [jondurbin/py-dpo-v0.1](https://huggingface.co/datasets/argilla/distilabel-math-preference-dpo)

 The language model Phi-1.5 is a Transformer with **1.3 billion** parameters. It was trained using the same data sources as [phi-1](https://huggingface.co/microsoft/phi-1), augmented with a new data source that consists of various NLP synthetic texts. When assessed against benchmarks testing common sense, language understanding, and logical reasoning, Phi-1.5 demonstrates a nearly state-of-the-art performance among models with less than 10 billion parameters.
 # Phi-1_5-Instruct-v0.1
+The model has underwent a post-training process that incorporates both **supervised fine-tuning** and **direct preference optimization** for instruction following. I used the [trl](https://huggingface.co/docs/trl/en/index) library and a single **A100 40GB** GPU during both the SFT and DPO steps.
 - Supervised Fine-Tuning
   - Used 128,000 instruction, response pairs from the [teknium/OpenHermes-2.5](https://huggingface.co/datasets/teknium/OpenHermes-2.5) dataset
     - [argilla/distilabel-math-preference-dpo](https://huggingface.co/datasets/argilla/distilabel-math-preference-dpo)
     - [jondurbin/py-dpo-v0.1](https://huggingface.co/datasets/argilla/distilabel-math-preference-dpo)
+## How to use
+### Chat Format
+Given the nature of the training data, the Phi-1.5 Instruct model is best suited for prompts using the chat format as follows.
+You can provide the prompt as a question with a generic template as follow:
+```markdown
+<|im_start|>system
+You are a helpful assistant.<|im_end|>
+<|im_start|>user
+Question?<|im_end|>
+<|im_start|>assistant
+```
+For example:
+```markdown
+<|im_start|>system
+You are a helpful assistant.<|im_end|>
+<|im_start|>user
+How to explain Internet for a medieval knight?<|im_end|>
+<|im_start|>assistant
+```
+where the model generates the text after `<|im_start|>assistant` . In case of few-shots prompt, the prompt can be formatted as the following:
+### Sample inference code
+This code snippets show how to get quickly started with running the model on a GPU:
+```python
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
+torch.random.manual_seed(0)
+model_id = "rasyosef/Phi-1_5-Instruct-v0.1"
+model = AutoModelForCausalLM.from_pretrained(
+    model_id,
+    device_map="cuda",
+    torch_dtype="auto"
+)
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+messages = [
+    {"role": "system", "content": "You are a helpful AI assistant."},
+    {"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"},
+    {"role": "assistant", "content": "Sure! Here are some ways to eat bananas and dragonfruits together: 1. Banana and dragonfruit smoothie: Blend bananas and dragonfruits together with some milk and honey. 2. Banana and dragonfruit salad: Mix sliced bananas and dragonfruits together with some lemon juice and honey."},
+    {"role": "user", "content": "What about solving an 2x + 3 = 7 equation?"},
+]
+pipe = pipeline(
+    "text-generation",
+    model=model,
+    tokenizer=tokenizer,
+)
+generation_args = {
+    "max_new_tokens": 500,
+    "return_full_text": False,
+    "temperature": 0.0,
+    "do_sample": False,
+}
+output = pipe(messages, **generation_args)
+print(output[0]['generated_text'])
+```
+Note: If you want to use flash attention, call _AutoModelForCausalLM.from_pretrained()_ with _attn_implementation="flash_attention_2"_