65 6 14

David Berenstein

davidberenstein1957

https://www.linkedin.com/in/david-berenstein-1bab11105/

davidbstein1957

davidberenstein1957

AI & ML interests

Everything NLP and knowledge graphs

Articles

Data Is Better Together: A Look Back and Forward

16 days ago

• 14

Organizations

Posts 3

Post

2337

⚗️ Looking to get started with Synthetic data and AI Feedback?

I created this cool notebook for a workshop @davanstrien and I gave it a couple of weeks back. It uses https://distilabel.argilla.io/dev/ and I think it is a good entry point for anyone with a practical interest in the topic.

https://colab.research.google.com/github/davanstrien/data-for-fine-tuning-llms/blob/main/03-synthetic-data-generation.ipynb

Post

1601

🔥🆕🆕🔥 Dataset Drop: 4 KTO signal transformed versions of the highly loved Argilla DPO datasets.

KTO formats for:
- UltraFeedback Cleaned Binarized
- Distilabel Intel Orca
- Distilabel Capybara
- DPO mix

argilla/preference-datasets-for-kto-65f98314d7c1b04ab54d41a7

Paper claims :)

https://arxiv.org/abs/2402.01306

KTO matches or exceeds DPO performance at scales from 1B to 30B parameters.1 That is, taking a preference dataset of n DPO pairs and breaking it up into 2n examples for KTO can yield better generations, despite the model ostensibly learning from a weaker signal.

KTO can handle extreme data imbalances, matching DPO performance while using up to 90% fewer desirable examples (i.e., examples of good generations). Its success thus cannot be ascribed to the alignment data being sourced from a preference dataset.

When the pretrained model is sufficiently good, one can skip supervised finetuning and go straight to KTO without a loss in generation quality. In contrast, we find that without doing SFT first, DPO-aligned models are significantly worse at all scales.

Do you need something custom? Take a look at @davanstrien his guide on creating your own KTO dataset with Argilla and our community.

https://github.com/huggingface/data-is-better-together/tree/main/kto-preference

View all posts