lm-human-preference-details vwxyzjn/train_policy_accelerate__sentiment_offline_5k.json__seed1__1696447674 Text Generation • Updated Oct 4, 2023 • 2 lm-human-preference-details/train_policy_accelerate__sentiment_offline_5k.json__seed1 Text Generation • Updated Oct 4, 2023 • 2
vwxyzjn/train_policy_accelerate__sentiment_offline_5k.json__seed1__1696447674 Text Generation • Updated Oct 4, 2023 • 2
lm-human-preference-details/train_policy_accelerate__sentiment_offline_5k.json__seed1 Text Generation • Updated Oct 4, 2023 • 2
RLOO / PPOv2 TL;DR summarize checkpoints vwxyzjn/ppo_tldr Text Generation • Updated May 24 • 7 vwxyzjn/ppo_tldr_6.9b Text Generation • Updated 26 days ago • 6 vwxyzjn/rloo_tldr Text Generation • Updated 22 days ago • 7 vwxyzjn/rloo_tldr_6.9b Text Generation • Updated 26 days ago • 2
vwxyzjn/summarize_from_feedback_tldr_3_filtered_oai_preprocessing_1711138793 Viewer • Updated Mar 22 • 130k • 1
vwxyzjn/summarize_from_feedback_tldr_3_filtered_oai_preprocessing_1711138084 Viewer • Updated Mar 22 • 130k • 3