RLHFlow

university

RLHFlow

AI & ML interests

Workflow of Reinforcement Learning from Human Feedback (RLHF). Blog: https://rlhflow.github.io/

Collections 5

models 6

datasets 29

RLHFlow/iterative-prompt-v1-iter9-20K

Viewer • Updated 27 days ago • 19.9k • 14 • 1

RLHFlow/iterative-prompt-v1-iter8-20K

Viewer • Updated 27 days ago • 20k • 12

RLHFlow/iterative-prompt-v1-iter7-20K

Viewer • Updated 27 days ago • 20k • 12

RLHFlow/iterative-prompt-v1-iter6-20K

Viewer • Updated 27 days ago • 20k • 15

RLHFlow/iterative-prompt-v1-iter5-20K

Viewer • Updated 27 days ago • 20k • 36

RLHFlow/iterative-prompt-v1-iter4-20K

Viewer • Updated 27 days ago • 20k • 41

RLHFlow/pair-preference-dataset-700K

Viewer • Updated May 26 • 699k • 821 • 2

RLHFlow/test_generation_2k

Viewer • Updated May 12 • 2k • 259

RLHFlow/SHP-standard

Viewer • Updated May 9 • 93.3k • 4.86k

RLHFlow/HH-RLHF-Harmless-and-RedTeam-standard

Viewer • Updated May 8 • 42.3k • 4.61k • 2

RLHFlow

AI & ML interests

Collections 5

RLHFlow/UltraFeedback-preference-standard

RLHFlow/Helpsteer-preference-standard

RLHFlow/HH-RLHF-Helpful-standard

RLHFlow/Orca-distibalel-standard

hendrydong/preference_700K

weqweasdas/preference_dataset_mixture2_and_safe_pku

models 6

RLHFlow/ArmoRM-Llama3-8B-v0.1

RLHFlow/LLaMA3-iterative-DPO-final

RLHFlow/pair-preference-model-LLaMA3-8B

RLHFlow/LLaMA3-SFT

RLHFlow/DPA-v1-Mistral-7B

RLHFlow/RewardModel-Mistral-7B-for-DPA-v1

datasets 29

RLHFlow/iterative-prompt-v1-iter9-20K

RLHFlow/iterative-prompt-v1-iter8-20K

RLHFlow/iterative-prompt-v1-iter7-20K

RLHFlow/iterative-prompt-v1-iter6-20K

RLHFlow/iterative-prompt-v1-iter5-20K

RLHFlow/iterative-prompt-v1-iter4-20K

RLHFlow/pair-preference-dataset-700K

RLHFlow/test_generation_2k

RLHFlow/SHP-standard

RLHFlow/HH-RLHF-Harmless-and-RedTeam-standard

AI & ML interests

Team members 3

Collections 5

models 6 Sort: Recently updated

datasets 29 Sort: Recently updated

models 6

datasets 29