gan-yang-zuzhu commited on
Commit
7186f2d
1 Parent(s): 8810cfa

update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -10,8 +10,8 @@ tags:
10
  - preference model
11
  ---
12
 
13
- #### RoVRM
14
- RoVRM is a robust visual reward model developed through a three-phase progressive training (i.e., pre-training with textual preference data→fine-tuning with image caption-based preference data→fine-tuning with visual preference data), and optimal transport-based selective preference data.
15
  These approaches effectively transfer preferences from auxiliary textual data to enhance the model's robustness.
16
  The repository hosts the RoVRM built on the LLaVA-1.5-7B model.
17
  We employed RoVRM for best-of-$n$ sampling and RL training, demonstrating its capability to significantly improve performance and reduce hallucination in large vision-language models.
 
10
  - preference model
11
  ---
12
 
13
+ #### Robust Visual Reward Model
14
+ Robust visual reward model (RoVRM) is developed through a three-phase progressive training (i.e., pre-training with textual preference data→fine-tuning with image caption-based preference data→fine-tuning with visual preference data), and optimal transport-based selective preference data.
15
  These approaches effectively transfer preferences from auxiliary textual data to enhance the model's robustness.
16
  The repository hosts the RoVRM built on the LLaVA-1.5-7B model.
17
  We employed RoVRM for best-of-$n$ sampling and RL training, demonstrating its capability to significantly improve performance and reduce hallucination in large vision-language models.