davanstrien HF staff commited on
Commit
ecb8a41
β€’
1 Parent(s): ff86a3f
Files changed (1) hide show
  1. app.py +1 -0
app.py CHANGED
@@ -142,6 +142,7 @@ To make the ColPali models work even better we might want a dataset of query/ima
142
  One way in which we might go about generating such a dataset is to use an VLM to generate synthetic queries for us.
143
  This space uses the [Qwen/Qwen2-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct) VLM model to generate queries for a document, based on an input document image.
144
 
 
145
 
146
  This [blog post](https://danielvanstrien.xyz/posts/post-with-code/colpali/2024-09-23-generate_colpali_dataset.html) gives an overview of how you can use this kind of approach to generate a full dataset for fine-tuning ColPali models.
147
 
 
142
  One way in which we might go about generating such a dataset is to use an VLM to generate synthetic queries for us.
143
  This space uses the [Qwen/Qwen2-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct) VLM model to generate queries for a document, based on an input document image.
144
 
145
+ **Note** there is a lot of scope for improving to prompts and the quality of the generated queries! If you have any suggestions for improvements please open a Discussion!
146
 
147
  This [blog post](https://danielvanstrien.xyz/posts/post-with-code/colpali/2024-09-23-generate_colpali_dataset.html) gives an overview of how you can use this kind of approach to generate a full dataset for fine-tuning ColPali models.
148