davanstrien HF staff commited on
Commit
6c3e99a
β€’
1 Parent(s): e3dcfdd

formatting

Browse files
Files changed (1) hide show
  1. app.py +3 -1
app.py CHANGED
@@ -135,8 +135,10 @@ def generate_response(image):
135
 
136
  title = "ColPali fine-tuning Query Generator"
137
  description = """[ColPali](https://huggingface.co/papers/2407.01449) is a very exciting new approach to multimodal document retrieval which aims to replace existing document retrievers which often rely on an OCR step with an end-to-end multimodal approach.
138
- To train ColPali models, we need a dataset of image-text pairs which represent the document images and the relevant text queries which those documents should match.
 
139
  To make the ColPali models work even better we might want a dataset of query/image document pairs related to our domain or task.
 
140
  One way in which we might go about generating such a dataset is to use an VLM to generate synthetic queries for us.
141
  This space uses the [Qwen/Qwen2-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct) to generate queries for a document, based on an input document image.
142
 
 
135
 
136
  title = "ColPali fine-tuning Query Generator"
137
  description = """[ColPali](https://huggingface.co/papers/2407.01449) is a very exciting new approach to multimodal document retrieval which aims to replace existing document retrievers which often rely on an OCR step with an end-to-end multimodal approach.
138
+
139
+ To train or fine-tune a ColPali model, we need a dataset of image-text pairs which represent the document images and the relevant text queries which those documents should match.
140
  To make the ColPali models work even better we might want a dataset of query/image document pairs related to our domain or task.
141
+
142
  One way in which we might go about generating such a dataset is to use an VLM to generate synthetic queries for us.
143
  This space uses the [Qwen/Qwen2-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct) to generate queries for a document, based on an input document image.
144