rasyosef
/

Phi-1_5-Instruct-v0.1

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Phi-1_5-Instruct-v0.1 / README.md

rasyosef's picture

Update README.md

961aa6b verified 2 months ago

|

No virus

1.78 kB

	---
	library_name: transformers
	license: mit
	datasets:
	- teknium/OpenHermes-2.5
	- HuggingFaceH4/ultrafeedback_binarized
	- argilla/distilabel-intel-orca-dpo-pairs
	- jondurbin/py-dpo-v0.1
	- argilla/distilabel-math-preference-dpo
	pipeline_tag: text-generation
	---

	# Phi-1.5
	The language model Phi-1.5 is a Transformer with 1.3 billion parameters. It was trained using the same data sources as [phi-1](https://huggingface.co/microsoft/phi-1), augmented with a new data source that consists of various NLP synthetic texts. When assessed against benchmarks testing common sense, language understanding, and logical reasoning, Phi-1.5 demonstrates a nearly state-of-the-art performance among models with less than 10 billion parameters.

	# Phi-1_5-Instruct-v0.1
	The model has underwent a post-training process that incorporates both supervised fine-tuning and direct preference optimization for instruction following. I used the [trl](https://huggingface.co/docs/trl/en/index) library and a single A100 40GB GPU during both the SFT and DPO steps.

	- Supervised Fine-Tuning
	- Used 128,000 instruction, response pairs from the [teknium/OpenHermes-2.5](https://huggingface.co/datasets/teknium/OpenHermes-2.5) dataset

	- Direct Preference Optimization (DPO)
	- Used a combination of the following preference datasets
	- [HuggingFaceH4/ultrafeedback_binarized](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized)
	- [argilla/distilabel-intel-orca-dpo-pairs](https://huggingface.co/datasets/argilla/distilabel-intel-orca-dpo-pairs)
	- [argilla/distilabel-math-preference-dpo](https://huggingface.co/datasets/argilla/distilabel-math-preference-dpo)
	- [jondurbin/py-dpo-v0.1](https://huggingface.co/datasets/argilla/distilabel-math-preference-dpo)