File size: 926 Bytes
c472462 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
# The VILD Dataset (VIdeo and Long-Description)
This dataset is proposed from [VideoCLIP-XL](https://arxiv.org/abs/2410.00741).
We establish an automatic data collection system, designed to aggregate sufficient and high-quality pairs from multiple data sources.
We have successfully collected over 2M (VIdeo, Long Description) pairs, denoted as our VILD dataset.
# Format
```json
{
"short_captions": [
"...",
],
"long_captions": [
"...",
],
"video_id": "..."
}
{
.....
},
.....
```
# Source
~~~
@misc{wang2024videoclipxladvancinglongdescription,
title={VideoCLIP-XL: Advancing Long Description Understanding for Video CLIP Models},
author={Jiapeng Wang and Chengyu Wang and Kunzhe Huang and Jun Huang and Lianwen Jin},
year={2024},
eprint={2410.00741},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2410.00741},
}
~~~ |