MA-RLHF: Reinforcement Learning from Human Feedback with Macro Actions Paper • 2410.02743 • Published 13 days ago • 5
Self-Boosting Large Language Models with Synthetic Preference Data Paper • 2410.06961 • Published 7 days ago • 14