A collection of chat models to explore the differences between three alignment techniques: DPO, IPO, and KTO.
AI & ML interests
None defined yet.
Collections
1
models
62
![](https://cdn-avatars.huggingface.co/v1/production/uploads/5e48005437cb5b49818287a5/zG5_UiVpP1hkuRQOD73de.png)
trl-lib/qwen1.5-1.8b-dpo-cli
Updated
![](https://cdn-avatars.huggingface.co/v1/production/uploads/5e48005437cb5b49818287a5/zG5_UiVpP1hkuRQOD73de.png)
trl-lib/qwen1.5-0.5b-sft
Text Generation
•
Updated
•
11
![](https://cdn-avatars.huggingface.co/v1/production/uploads/5e48005437cb5b49818287a5/zG5_UiVpP1hkuRQOD73de.png)
trl-lib/qwen1.5-1.8b-sft
Text Generation
•
Updated
•
241
•
4
![](https://cdn-avatars.huggingface.co/v1/production/uploads/5e48005437cb5b49818287a5/zG5_UiVpP1hkuRQOD73de.png)
trl-lib/OpenHermes-2-Mistral-7B-sigmoid-beta-0.9-steps-800
Updated
![](https://cdn-avatars.huggingface.co/v1/production/uploads/5e48005437cb5b49818287a5/zG5_UiVpP1hkuRQOD73de.png)
trl-lib/OpenHermes-2-Mistral-7B-sigmoid-beta-0.8-steps-800
Updated
![](https://cdn-avatars.huggingface.co/v1/production/uploads/5e48005437cb5b49818287a5/zG5_UiVpP1hkuRQOD73de.png)
trl-lib/OpenHermes-2-Mistral-7B-sigmoid-beta-0.7-steps-800
Updated
![](https://cdn-avatars.huggingface.co/v1/production/uploads/5e48005437cb5b49818287a5/zG5_UiVpP1hkuRQOD73de.png)
trl-lib/OpenHermes-2-Mistral-7B-sigmoid-beta-0.6-steps-800
Updated
![](https://cdn-avatars.huggingface.co/v1/production/uploads/5e48005437cb5b49818287a5/zG5_UiVpP1hkuRQOD73de.png)
trl-lib/OpenHermes-2-Mistral-7B-sigmoid-beta-0.5-steps-800
Updated
![](https://cdn-avatars.huggingface.co/v1/production/uploads/5e48005437cb5b49818287a5/zG5_UiVpP1hkuRQOD73de.png)
trl-lib/OpenHermes-2-Mistral-7B-sigmoid-beta-0.4-steps-800
Updated
![](https://cdn-avatars.huggingface.co/v1/production/uploads/5e48005437cb5b49818287a5/zG5_UiVpP1hkuRQOD73de.png)
trl-lib/OpenHermes-2-Mistral-7B-sigmoid-beta-0.3-steps-800
Updated