LxzGordon
/

URM-LLaMa-3-8B

Text Classification

Model card Files Files and versions Community

LxzGordon commited on 9 days ago

Commit

2bbe6bf

•

1 Parent(s): d80257d

Update README.md

Files changed (1) hide show

README.md +7 -4

README.md CHANGED Viewed

@@ -4,11 +4,14 @@ datasets:
 pipeline_tag: text-classification
 ---
-**Paper:** Coming soon
 # Brief
-[URM-llama3.1-8B](https://huggingface.co/LxzGordon/URM-llama3.1-8B) is an uncertain-aware reward model.
 This RM consists of a base model and an uncertainty-aware and attribute-specific value head. The base model of this RM is from [FsfairX-LLaMA3-RM-v0.1](https://huggingface.co/sfairXC/FsfairX-LLaMA3-RM-v0.1).
 ## Attribute Regression
@@ -17,14 +20,14 @@ This RM consists of a base model and an uncertainty-aware and attribute-specific
 During training, instead of multi-attributes scores, outputs of the uncertainty-aware value head are parameters of a normal distribution, from which scores are sampled. Then we run regression on the outputs with the labels to train the value head. To enable gradient back-propagation, reparameterization technique is used.
-We use the five attributes from HelpSteer2: Helpfulness, Correctness, Coherence, Complexity and Verbosity. To combine these attributes, we use the weighted sum with prior weights ```[0.3, 0.74, 0.46, 0.47,-0.33]``` recommended by [Nemotron-4](https://huggingface.co/nvidia/Nemotron-4-340B-Reward).
 # Usage
 ```python
 import torch
 from transformers import AutoModelForSequenceClassification, AutoTokenizer
-model_name = "./URM-llama3-8B"
 model = AutoModelForSequenceClassification.from_pretrained(
     model_name,
     device_map='auto',

 pipeline_tag: text-classification
 ---
+- **Paper:** Coming soon
+- **Model:** [URM-LLaMa-3-8B](https://huggingface.co/LxzGordon/URM-LLaMa-3-8B)
+  - Fine-tuned from [FsfairX-LLaMA3-RM-v0.1](https://huggingface.co/sfairXC/FsfairX-LLaMA3-RM-v0.1)
 # Brief
+[URM-LLaMa-3-8B](https://huggingface.co/LxzGordon/URM-LLaMa-3-8B) is an uncertain-aware reward model.
 This RM consists of a base model and an uncertainty-aware and attribute-specific value head. The base model of this RM is from [FsfairX-LLaMA3-RM-v0.1](https://huggingface.co/sfairXC/FsfairX-LLaMA3-RM-v0.1).
 ## Attribute Regression
 During training, instead of multi-attributes scores, outputs of the uncertainty-aware value head are parameters of a normal distribution, from which scores are sampled. Then we run regression on the outputs with the labels to train the value head. To enable gradient back-propagation, reparameterization technique is used.
+We use the five attributes from HelpSteer2: Helpfulness, Correctness, Coherence, Complexity and Verbosity. We use weighted sum to combine these attributes with prior weights ```[0.3, 0.74, 0.46, 0.47,-0.33]``` recommended by [Nemotron-4](https://huggingface.co/nvidia/Nemotron-4-340B-Reward).
 # Usage
 ```python
 import torch
 from transformers import AutoModelForSequenceClassification, AutoTokenizer
+model_name = "LxzGordon/URM-LLaMa-3-8B"
 model = AutoModelForSequenceClassification.from_pretrained(
     model_name,
     device_map='auto',