Paper is great but this model failed at every single task I threw at it

#2
by rmeireles - opened

Pushed my own CV to this space and asked basic questions like "What companies did this candidate work for?" or "What courses did this candidate take?" and it always answers with a single line. Is it configured to stop at line breaks?

Every single question I asked it about my CV was wrong. Is this to be expected?

Adding to the discussion, even taking the sample images and asking questions that require more than a few words will fail every single time.

Hi Rodrigo! We fine-tuned the model on DocVQA. In that dataset, all the answers are very short, so it makes complete sense that the answers are very short. DocVQA is also a small dataset, so it can be that CVs are out of domain for this model. We are working on training it on a larger dataset and getting a better VQA model out of Florence-2. Our goal here was to quickly show how to fine-tune the model such that, while we work on a good VQA version, the community can also work on whatever application they want. Checkout our blog: https://huggingface.co/blog/finetune-florence2

@andito I read the blog, thank you for writing it. The Cauldron appears to have longer answers which might suit my needs better but it also appears to contain very few text-rich documents, maybe CVs will also be OOD. Do you happen to know a dataset more suitable for this task?

Also, reading the DocVQA description:
"Similar to typical VQA task, task is to answer questions asked on a given document image. Similar to extractive QA framework popular in NLP, here the answer for the question is always a single span of text extracted from the given document image."

So I'll start to be more skeptic torwards DocVQA benchmarks from now on haha

This dataset might be a good starting point: https://huggingface.co/datasets/pixparse/pdfa-eng-wds

Honestly, I also see the lack of a good dataset for this in the community and I'm working on building one. I should be able to release it in the next month or so :)

Please let me know if I can join you in contributing. I'd be more than happy to oblige.

Also, thank you very much for this discussion.

rmeireles changed discussion status to closed

Sign up or log in to comment