alibaba-pai/LVDR · The Iteration Process

3 days ago

Dear authors, I wonder if the generation of candidates of LVDR is same as the HDR during training.
When you extract some components with spaCy, what principles are used for word substitution? If there is the fixed vocabulary or with the help of LLM? Do candidates need to be filtered again to rule out unreasonable circumstances？
Will the code of candidates(LVDR/DDR/HDR) production will be released in future? Thanks for your attention, that will help me a lot！

jpWang

Alibaba-PAI org about 10 hours ago

Sorry for the late reply.
The generation of candidates of LVDR and the HDR during training are similar. We use multiple word sets so that the words in each set are basically of the same type (such as they are all colors) but have completely different semantics, and then randomly replace them based on the set. We have also tried using LLM, but the effect is not ideal because the generated substitution is uncontrollable. We also manually check the replacement of LVDR, so that the replacement words are "basically of the same type but with completely different semantics" as much as possible. We currently have no plans to release the code of candidates.

Arsenever

about 4 hours ago

Thank you very much for your answer. By the way, I'm looking for the meta json for Shot2Story Retrieval, is this accurate? https://huggingface.co/datasets/mhan/shot2story/blob/main/20k_test.json

Arsenever

about 2 hours ago

Sorry for the late reply.
The generation of candidates of LVDR and the HDR during training are similar. We use multiple word sets so that the words in each set are basically of the same type (such as they are all colors) but have completely different semantics, and then randomly replace them based on the set. We have also tried using LLM, but the effect is not ideal because the generated substitution is uncontrollable. We also manually check the replacement of LVDR, so that the replacement words are "basically of the same type but with completely different semantics" as much as possible. We currently have no plans to release the code of candidates.

In LVDR，only one word is replaced in every iteration. If there is a number during HDR training? I mention that multiple words are replaced every iteration.