English
File size: 974 Bytes
37faf13
 
 
 
 
c472462
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
---
license: cc-by-nc-sa-4.0
language:
- en
---
# The VILD Dataset (VIdeo and Long-Description)

This dataset is proposed from [VideoCLIP-XL](https://arxiv.org/abs/2410.00741). 
We establish an automatic data collection system, designed to aggregate sufficient and high-quality pairs from multiple data sources. 
We have successfully collected over 2M (VIdeo, Long Description) pairs, denoted as our VILD dataset.

# Format
```json 
{
  "short_captions": [
        "...",
    ],
  "long_captions": [
        "...",
    ],
  "video_id": "..."
}
{
  .....
},
.....
```


# Source
~~~
@misc{wang2024videoclipxladvancinglongdescription,
      title={VideoCLIP-XL: Advancing Long Description Understanding for Video CLIP Models}, 
      author={Jiapeng Wang and Chengyu Wang and Kunzhe Huang and Jun Huang and Lianwen Jin},
      year={2024},
      eprint={2410.00741},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2410.00741}, 
}
~~~