English
LVDR / README.md
jpWang's picture
Update README.md
d262258 verified
|
raw
history blame
1.21 kB
---
license: cc-by-nc-sa-4.0
language:
- en
---
# The LVDR Benchmark (Long Video Description Ranking)
This benchmark is proposed from [VideoCLIP-XL](https://arxiv.org/abs/2410.00741).
Given each video and its corresponding ground-truth description, we perform a synthesis process that iterates p − 1 times and alters q words as hallucination during each iteration, resulting in totally p descriptions with gradually increasing degrees of hallucination. We denote such a subset as p × q and construct five subsets as {4 × 1, 4 × 2, 4 × 3, 4 × 4, 4 × 5}. The video CLIP models need to be able to correctly rank these descriptions in descending order of similarity given the video.
# Format
```json
{
"long_captions": [
"...",
],
"video_id": "..."
}
{
.....
},
.....
```
# Source
~~~
@misc{wang2024videoclipxladvancinglongdescription,
title={VideoCLIP-XL: Advancing Long Description Understanding for Video CLIP Models},
author={Jiapeng Wang and Chengyu Wang and Kunzhe Huang and Jun Huang and Lianwen Jin},
year={2024},
eprint={2410.00741},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2410.00741},
}
~~~