Running the inference to judge a pair of assistant responses
This model has been trained using specific evaluation prompt. Our code example guides on how to wrap your data so that the model will process the input in the expected way.
run_model.py goes over necessary steps to prepare the input data (lists of input quieries and assistant responses to judge), and run the model inference using vllm.
Run run_example.sh using a pair of example inputs from helpsteer2 validation set to verify the run. You can compare your judgements with those from example_outputs.jsonl
.
This code has been tested with
vllm==0.6.1
,torch==2.4.0
.