We collect the open-source datasets and process them into the standard format.
AI & ML interests
Workflow of Reinforcement Learning from Human Feedback (RLHF). Blog: https://rlhflow.github.io/
models
6
RLHFlow/ArmoRM-Llama3-8B-v0.1
Text Classification
•
Updated
•
10.3k
•
83
RLHFlow/LLaMA3-iterative-DPO-final
Text Generation
•
Updated
•
3.14k
•
38
RLHFlow/pair-preference-model-LLaMA3-8B
Text Generation
•
Updated
•
5.35k
•
27
RLHFlow/LLaMA3-SFT
Text Generation
•
Updated
•
3.91k
•
5
RLHFlow/DPA-v1-Mistral-7B
Text Generation
•
Updated
•
22
•
2
RLHFlow/RewardModel-Mistral-7B-for-DPA-v1
Text Classification
•
Updated
•
32
datasets
29
RLHFlow/iterative-prompt-v1-iter9-20K
Viewer
•
Updated
•
19.9k
•
14
•
1
RLHFlow/iterative-prompt-v1-iter8-20K
Viewer
•
Updated
•
20k
•
12
RLHFlow/iterative-prompt-v1-iter7-20K
Viewer
•
Updated
•
20k
•
12
RLHFlow/iterative-prompt-v1-iter6-20K
Viewer
•
Updated
•
20k
•
15
RLHFlow/iterative-prompt-v1-iter5-20K
Viewer
•
Updated
•
20k
•
36
RLHFlow/iterative-prompt-v1-iter4-20K
Viewer
•
Updated
•
20k
•
41
RLHFlow/pair-preference-dataset-700K
Viewer
•
Updated
•
699k
•
821
•
2
RLHFlow/test_generation_2k
Viewer
•
Updated
•
2k
•
259
RLHFlow/SHP-standard
Viewer
•
Updated
•
93.3k
•
4.86k
RLHFlow/HH-RLHF-Harmless-and-RedTeam-standard
Viewer
•
Updated
•
42.3k
•
4.61k
•
2