LxzGordon commited on
Commit
2bbe6bf
1 Parent(s): d80257d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -4
README.md CHANGED
@@ -4,11 +4,14 @@ datasets:
4
  pipeline_tag: text-classification
5
  ---
6
 
7
- **Paper:** Coming soon
8
 
 
 
 
9
  # Brief
10
 
11
- [URM-llama3.1-8B](https://huggingface.co/LxzGordon/URM-llama3.1-8B) is an uncertain-aware reward model.
12
  This RM consists of a base model and an uncertainty-aware and attribute-specific value head. The base model of this RM is from [FsfairX-LLaMA3-RM-v0.1](https://huggingface.co/sfairXC/FsfairX-LLaMA3-RM-v0.1).
13
 
14
  ## Attribute Regression
@@ -17,14 +20,14 @@ This RM consists of a base model and an uncertainty-aware and attribute-specific
17
 
18
  During training, instead of multi-attributes scores, outputs of the uncertainty-aware value head are parameters of a normal distribution, from which scores are sampled. Then we run regression on the outputs with the labels to train the value head. To enable gradient back-propagation, reparameterization technique is used.
19
 
20
- We use the five attributes from HelpSteer2: Helpfulness, Correctness, Coherence, Complexity and Verbosity. To combine these attributes, we use the weighted sum with prior weights ```[0.3, 0.74, 0.46, 0.47,-0.33]``` recommended by [Nemotron-4](https://huggingface.co/nvidia/Nemotron-4-340B-Reward).
21
 
22
  # Usage
23
  ```python
24
  import torch
25
  from transformers import AutoModelForSequenceClassification, AutoTokenizer
26
 
27
- model_name = "./URM-llama3-8B"
28
  model = AutoModelForSequenceClassification.from_pretrained(
29
  model_name,
30
  device_map='auto',
 
4
  pipeline_tag: text-classification
5
  ---
6
 
7
+ - **Paper:** Coming soon
8
 
9
+ - **Model:** [URM-LLaMa-3-8B](https://huggingface.co/LxzGordon/URM-LLaMa-3-8B)
10
+
11
+ - Fine-tuned from [FsfairX-LLaMA3-RM-v0.1](https://huggingface.co/sfairXC/FsfairX-LLaMA3-RM-v0.1)
12
  # Brief
13
 
14
+ [URM-LLaMa-3-8B](https://huggingface.co/LxzGordon/URM-LLaMa-3-8B) is an uncertain-aware reward model.
15
  This RM consists of a base model and an uncertainty-aware and attribute-specific value head. The base model of this RM is from [FsfairX-LLaMA3-RM-v0.1](https://huggingface.co/sfairXC/FsfairX-LLaMA3-RM-v0.1).
16
 
17
  ## Attribute Regression
 
20
 
21
  During training, instead of multi-attributes scores, outputs of the uncertainty-aware value head are parameters of a normal distribution, from which scores are sampled. Then we run regression on the outputs with the labels to train the value head. To enable gradient back-propagation, reparameterization technique is used.
22
 
23
+ We use the five attributes from HelpSteer2: Helpfulness, Correctness, Coherence, Complexity and Verbosity. We use weighted sum to combine these attributes with prior weights ```[0.3, 0.74, 0.46, 0.47,-0.33]``` recommended by [Nemotron-4](https://huggingface.co/nvidia/Nemotron-4-340B-Reward).
24
 
25
  # Usage
26
  ```python
27
  import torch
28
  from transformers import AutoModelForSequenceClassification, AutoTokenizer
29
 
30
+ model_name = "LxzGordon/URM-LLaMa-3-8B"
31
  model = AutoModelForSequenceClassification.from_pretrained(
32
  model_name,
33
  device_map='auto',