rasyosef
/

Phi-1_5-Instruct-v0.1

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

rasyosef commited on Jul 31

Commit

0c6293f

•

1 Parent(s): 8a32a4e

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -99,7 +99,7 @@ Note: If you want to use flash attention, call _AutoModelForCausalLM.from_pretra
 ## Benchmarks
-This model outperforms HuggingFace's SmolLM-1.7B-Instruct and the TinyLlama-1.1B-Chat-v1.0 models on IFEval and GSM8K benchmarks.
 - **IFEval (Instruction Following Evaluation)**: IFEval is a fairly interesting dataset that tests the capability of models to clearly follow explicit instructions, such as “include keyword x” or “use format y”. The models are tested on their ability to strictly follow formatting instructions rather than the actual contents generated, allowing strict and rigorous metrics to be used.
 - **GSM8k (5-shot)**: diverse grade school math word problems to measure a model's ability to solve multi-step mathematical reasoning problems.

 ## Benchmarks
+This model outperforms HuggingFace's SmolLM-1.7B-Instruct and the TinyLlama-1.1B-Chat-v1.0 models on IFEval and GSM8K benchmarks. These benchmarks were run using EleutherAI's [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness)
 - **IFEval (Instruction Following Evaluation)**: IFEval is a fairly interesting dataset that tests the capability of models to clearly follow explicit instructions, such as “include keyword x” or “use format y”. The models are tested on their ability to strictly follow formatting instructions rather than the actual contents generated, allowing strict and rigorous metrics to be used.
 - **GSM8k (5-shot)**: diverse grade school math word problems to measure a model's ability to solve multi-step mathematical reasoning problems.