cmarkea
/

idefics2-8b-ft-docvqa-lora

Visual Question Answering

Model card Files Files and versions Community

SOKOUDJOU commited on 26 days ago

Commit

cd4e113

•

1 Parent(s): fbd3223

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -87,7 +87,7 @@ with torch.inference_mode():
 ### Results
-By following the **LLM-as-Juries** evaluation method, the following results were obtained using three judge models (GPT-4o, Gemini1.5 Pro and Claude 3.5-Sonnet). These models were evaluated based on the average of two criteria: response accuracy and completeness, similar to what the [SSA metric](https://arxiv.org/abs/2001.09977) aims to capture. This metric was adapted to the VQA context, with clear criteria for each score (0 to 5) to ensure the highest possible precision in meeting expectations.
 ![constellation](https://i.postimg.cc/kMRmcBpQ/constellation-0.png)

 ### Results
+By following the **[LLM-as-Juries](https://arxiv.org/abs/2404.18796)** evaluation method, the following results were obtained using three judge models (GPT-4o, Gemini1.5 Pro and Claude 3.5-Sonnet). These models were evaluated based on the average of two criteria: response accuracy and completeness, similar to what the [SSA metric](https://arxiv.org/abs/2001.09977) aims to capture. This metric was adapted to the VQA context, with clear criteria for each score (0 to 5) to ensure the highest possible precision in meeting expectations.
 ![constellation](https://i.postimg.cc/kMRmcBpQ/constellation-0.png)