This is the SFT checkpoint used for the project Online-RLHF. Also check our technical report here.

The model is trained from meta-llama/Meta-Llama-3-8B on a mixture of diverse open-source high-quality data for 1 epoch with detailed parameters in the report. It has not been trained by RLHF and can serve as a good starting point for the RLHF research.

Downloads last month: 4,868

Safetensors

Model size

8.03B params

Tensor type

BF16

Text Generation

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Space using RLHFlow/LLaMA3-SFT 1

Collection including RLHFlow/LLaMA3-SFT

Online RLHF

Collection

Datasets, code, and models for online RLHF (i.e., iterative DPO) • 19 items • Updated Jun 12 • 4