qgyd2021's picture
Update README.md
e027c52
|
raw
history blame contribute delete
No virus
736 Bytes
metadata
license: apache-2.0
datasets:
  - lvwerra/stack-exchange-paired
language:
  - en
library_name: adapter-transformers
pipeline_tag: text-generation
tags:
  - reward_model

Reward Model GPT2

fine-tuned GPT2 to a reward model.

The model is designed to generate human-like responses to questions in Stack Exchange domains of programming, mathematics, physics, and more.

For training code check the github example.

info:

  • epoch: 1.0
  • train_loss: 0.641692199903866
  • eval_loss: 0.6299035549163818
  • eval_accuracy: 0.729