PyTorch
English
Tevatron
phi3_v
vidore
custom_code
MrLight commited on
Commit
cda3e23
1 Parent(s): 3f8b454

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -14,7 +14,7 @@ library_name: Tevatron
14
 
15
  DSE-Phi3-Docmatix-V1.0 is a bi-encoder model designed to encode document screenshots into dense vectors for document retrieval. The Document Screenshot Embedding ([DSE](https://arxiv.org/abs/2406.11251)) approach captures documents in their original visual format, preserving all information such as text, images, and layout, thus avoiding tedious parsing and potential information loss.
16
 
17
- The model, `Tevatron/dse-phi3-docmatix-v1.0`, is trained using the `Tevatron/docmatix-ir` dataset, a variant of `HuggingFaceM4/Docmatix` specifically adapted for training PDF retrievers with Vision Language Models in open-domain question answering scenarios. For more information on dataset filtering and hard negative mining, refer to the [docmatix-ir dataset page](https://huggingface.co/datasets/Tevatron/docmatix-ir/blob/main/README.md).
18
 
19
  ## How to Use the Model
20
 
 
14
 
15
  DSE-Phi3-Docmatix-V1.0 is a bi-encoder model designed to encode document screenshots into dense vectors for document retrieval. The Document Screenshot Embedding ([DSE](https://arxiv.org/abs/2406.11251)) approach captures documents in their original visual format, preserving all information such as text, images, and layout, thus avoiding tedious parsing and potential information loss.
16
 
17
+ The model, `Tevatron/dse-phi3-docmatix-v1.0`, is trained using the `Tevatron/docmatix-ir` dataset, a variant of `HuggingFaceM4/Docmatix` specifically adapted for training PDF retrievers with Vision Language Models in open-domain question answering scenarios. For more information on dataset filtering and hard negative mining, refer to the [docmatix-ir](https://huggingface.co/datasets/Tevatron/docmatix-ir/blob/main/README.md) dataset page.
18
 
19
  ## How to Use the Model
20