|
--- |
|
library_name: transformers |
|
license: mit |
|
datasets: |
|
- teknium/OpenHermes-2.5 |
|
- HuggingFaceH4/ultrafeedback_binarized |
|
- argilla/distilabel-intel-orca-dpo-pairs |
|
- jondurbin/py-dpo-v0.1 |
|
- argilla/distilabel-math-preference-dpo |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
# Phi-1.5 |
|
The language model Phi-1.5 is a Transformer with **1.3 billion** parameters. It was trained using the same data sources as [phi-1](https://huggingface.co/microsoft/phi-1), augmented with a new data source that consists of various NLP synthetic texts. When assessed against benchmarks testing common sense, language understanding, and logical reasoning, Phi-1.5 demonstrates a nearly state-of-the-art performance among models with less than 10 billion parameters. |
|
|
|
# Phi-1_5-Instruct-v0.1 |
|
The model has underwent a post-training process that incorporates both supervised fine-tuning and direct preference optimization for instruction following. I used the [trl](https://huggingface.co/docs/trl/en/index) library and a single **A100 40GB** GPU during both the SFT and DPO steps. |
|
|
|
- Supervised Fine-Tuning |
|
- Used 128,000 instruction, response pairs from the [teknium/OpenHermes-2.5](https://huggingface.co/datasets/teknium/OpenHermes-2.5) dataset |
|
|
|
- Direct Preference Optimization (DPO) |
|
- Used a combination of the following preference datasets |
|
- [HuggingFaceH4/ultrafeedback_binarized](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized) |
|
- [argilla/distilabel-intel-orca-dpo-pairs](https://huggingface.co/datasets/argilla/distilabel-intel-orca-dpo-pairs) |
|
- [argilla/distilabel-math-preference-dpo](https://huggingface.co/datasets/argilla/distilabel-math-preference-dpo) |
|
- [jondurbin/py-dpo-v0.1](https://huggingface.co/datasets/argilla/distilabel-math-preference-dpo) |
|
|
|
|