wangclnlp
/

robust_visual_reward_model

preference model

Model card Files Files and versions Community

gan-yang-zuzhu commited on Aug 23

Commit

7186f2d

•

1 Parent(s): 8810cfa

update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -10,8 +10,8 @@ tags:
 - preference model
 ---
-#### RoVRM
-RoVRM is a robust visual reward model developed through a three-phase progressive training (i.e., pre-training with textual preference data→fine-tuning with image caption-based preference data→fine-tuning with visual preference data), and optimal transport-based selective preference data.
 These approaches effectively transfer preferences from auxiliary textual data to enhance the model's robustness.
 The repository hosts the RoVRM built on the LLaVA-1.5-7B model.
 We employed RoVRM for best-of-$n$ sampling and RL training, demonstrating its capability to significantly improve performance and reduce hallucination in large vision-language models.

 - preference model
 ---
+#### Robust Visual Reward Model
+Robust visual reward model (RoVRM) is developed through a three-phase progressive training (i.e., pre-training with textual preference data→fine-tuning with image caption-based preference data→fine-tuning with visual preference data), and optimal transport-based selective preference data.
 These approaches effectively transfer preferences from auxiliary textual data to enhance the model's robustness.
 The repository hosts the RoVRM built on the LLaVA-1.5-7B model.
 We employed RoVRM for best-of-$n$ sampling and RL training, demonstrating its capability to significantly improve performance and reduce hallucination in large vision-language models.