Upload 6 files

Files changed (7) hide show

.gitattributes CHANGED Viewed

@@ -33,3 +33,8 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+LVDR-Benchmark-4x1.jsonl filter=lfs diff=lfs merge=lfs -text
+LVDR-Benchmark-4x2.jsonl filter=lfs diff=lfs merge=lfs -text
+LVDR-Benchmark-4x3.jsonl filter=lfs diff=lfs merge=lfs -text
+LVDR-Benchmark-4x4.jsonl filter=lfs diff=lfs merge=lfs -text
+LVDR-Benchmark-4x5.jsonl filter=lfs diff=lfs merge=lfs -text

LVDR-Benchmark-4x1.jsonl ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:c6d6ded11e07d8984dc2114970a3e1c32577bbc077fd2fbc1c45c6e9dde9dd40
+size 10643868

LVDR-Benchmark-4x2.jsonl ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:52c974ebabb22fdbccd1c6237658aa12533c6aa69cd901c4181502e3cde528d6
+size 10651568

LVDR-Benchmark-4x3.jsonl ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:c59b6064566d37dad162205602a211173e9f00a8ac276d2f3adcd662856ed84e
+size 10659244

LVDR-Benchmark-4x4.jsonl ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:d577501d8c5c971bc31768fa576b30dcef734971e808ad955e5623e93f518679
+size 10665474

LVDR-Benchmark-4x5.jsonl ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:d68d3322edbb1e6b55ebb034f092e608877013437947c444dcf77ce95ba5a239
+size 10673358

README.md ADDED Viewed

+# The LVDR Benchmark (Long Video Description Ranking)
+This benchmark is proposed from [VideoCLIP-XL](https://arxiv.org/abs/2410.00741).
+Given each video and its corresponding ground-truth description, we perform a synthesis process that iterates p − 1 times and alters q words as hallucination during each iteration, resulting in totally p descriptions with gradually increasing degrees of hallucination. We denote such a subset as p × q and construct five subsets as {4 × 1, 4 × 2, 4 × 3, 4 × 4, 4 × 5}. The video CLIP models need to be able to correctly rank these descriptions in descending order of similarity given the video.
+# Format
+```json
+{
+  "long_captions": [
+        "...",
+    ],
+  "video_id": "..."
+}
+{
+  .....
+},
+.....
+```
+# Source
+~~~
+@misc{wang2024videoclipxladvancinglongdescription,
+      title={VideoCLIP-XL: Advancing Long Description Understanding for Video CLIP Models},
+      author={Jiapeng Wang and Chengyu Wang and Kunzhe Huang and Jun Huang and Lianwen Jin},
+      year={2024},
+      eprint={2410.00741},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2410.00741},
+}
+~~~