Florence-2-DocVQA

Running on Zero

Paper is great but this model failed at every single task I threw at it

by rmeireles - opened 13 days ago

13 days ago

Pushed my own CV to this space and asked basic questions like "What companies did this candidate work for?" or "What courses did this candidate take?" and it always answers with a single line. Is it configured to stop at line breaks?

Every single question I asked it about my CV was wrong. Is this to be expected?

rmeireles

13 days ago

Adding to the discussion, even taking the sample images and asking questions that require more than a few words will fail every single time.

andito

Owner 13 days ago

Hi Rodrigo! We fine-tuned the model on DocVQA. In that dataset, all the answers are very short, so it makes complete sense that the answers are very short. DocVQA is also a small dataset, so it can be that CVs are out of domain for this model. We are working on training it on a larger dataset and getting a better VQA model out of Florence-2. Our goal here was to quickly show how to fine-tune the model such that, while we work on a good VQA version, the community can also work on whatever application they want. Checkout our blog: https://huggingface.co/blog/finetune-florence2

rmeireles

13 days ago

@andito I read the blog, thank you for writing it. The Cauldron appears to have longer answers which might suit my needs better but it also appears to contain very few text-rich documents, maybe CVs will also be OOD. Do you happen to know a dataset more suitable for this task?

Also, reading the DocVQA description:
"Similar to typical VQA task, task is to answer questions asked on a given document image. Similar to extractive QA framework popular in NLP, here the answer for the question is always a single span of text extracted from the given document image."

So I'll start to be more skeptic torwards DocVQA benchmarks from now on haha

andito

Owner 13 days ago

This dataset might be a good starting point: https://huggingface.co/datasets/pixparse/pdfa-eng-wds

Honestly, I also see the lack of a good dataset for this in the community and I'm working on building one. I should be able to release it in the next month or so :)

rmeireles

13 days ago

•

edited 13 days ago

Please let me know if I can join you in contributing. I'd be more than happy to oblige.

Also, thank you very much for this discussion.

rmeireles changed discussion status to closed 13 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment