thanks to DAMO-NLP-SG ❤

Browse files

Files changed (11) hide show

README.md +33 -0
finetune-billa7b-zh.pth +3 -0
finetune-vicuna13b-v2.pth +3 -0
finetune-vicuna7b-v2.pth +3 -0
finetune-ziya13b-zh.pth +3 -0
finetune_vicuna7b_audiobranch.pth +3 -0
pretrain-billa7b-zh.pth +3 -0
pretrain-vicuna13b.pth +3 -0
pretrain-ziya13b-zh.pth +3 -0
pretrain_vicuna7b-v2.pth +3 -0
pretrain_vicuna7b_audiobranch.pth +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,33 @@

+---
+license: bsd-3-clause
+language:
+- en
+- zh
+pipeline_tag: visual-question-answering
+---
+# Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
+This is the Hugging Face repo for storing pre-trained & fine-tuned checkpoints of our [Video-LLaMA](https://arxiv.org/abs/2306.02858), which is a multi-modal conversational large language model with video understanding capability.
+## Vision-Language Branch
+| Checkpoint       | Link | Note |
+|:------------|-------------|-------------|
+| pretrain-vicuna7b    | [link](https://huggingface.co/DAMO-NLP-SG/Video-LLaMA-Series/resolve/main/pretrain_vicuna7b-v2.pth)       | Pre-trained on WebVid (2.5M video-caption pairs) and LLaVA-CC3M (595k image-caption pairs) |
+| finetune-vicuna7b-v2 | [link](https://huggingface.co/DAMO-NLP-SG/Video-LLaMA-Series/resolve/main/finetune-vicuna7b-v2.pth) | Fine-tuned on the instruction-tuning data from [MiniGPT-4](https://github.com/Vision-CAIR/MiniGPT-4), [LLaVA](https://github.com/haotian-liu/LLaVA) and [VideoChat](https://github.com/OpenGVLab/Ask-Anything)|
+| pretrain-vicuna13b    | [link](https://huggingface.co/DAMO-NLP-SG/Video-LLaMA-Series/resolve/main/pretrain-vicuna13b.pth)       | Pre-trained on WebVid (2.5M video-caption pairs) and LLaVA-CC3M (595k image-caption pairs) |
+| finetune-vicuna13b-v2 | [link](https://huggingface.co/DAMO-NLP-SG/Video-LLaMA-Series/resolve/main/finetune-vicuna13b-v2.pth) | Fine-tuned on the instruction-tuning data from [MiniGPT-4](https://github.com/Vision-CAIR/MiniGPT-4), [LLaVA](https://github.com/haotian-liu/LLaVA) and [VideoChat](https://github.com/OpenGVLab/Ask-Anything)|
+| pretrain-ziya13b-zh | [link](https://huggingface.co/DAMO-NLP-SG/Video-LLaMA-Series/resolve/main/pretrain-ziya13b-zh.pth) | Pre-trained with Chinese LLM [Ziya-13B](https://huggingface.co/IDEA-CCNL/Ziya-LLaMA-13B-v1) |
+| finetune-ziya13b-zh | [link](https://huggingface.co/DAMO-NLP-SG/Video-LLaMA-Series/resolve/main/finetune-ziya13b-zh.pth) | Fine-tuned on machine-translated [VideoChat](https://github.com/OpenGVLab/Ask-Anything) instruction-following dataset (in Chinese)|
+| pretrain-billa7b-zh | [link](https://huggingface.co/DAMO-NLP-SG/Video-LLaMA-Series/resolve/main/pretrain-billa7b-zh.pth) | Pre-trained with Chinese LLM [BiLLA-7B](https://huggingface.co/IDEA-CCNL/Ziya-LLaMA-13B-v1) |
+| finetune-billa7b-zh | [link](https://huggingface.co/DAMO-NLP-SG/Video-LLaMA-Series/resolve/main/finetune-billa7b-zh.pth) | Fine-tuned on machine-translated [VideoChat](https://github.com/OpenGVLab/Ask-Anything) instruction-following dataset (in Chinese) |
+## Audio-Language Branch
+| Checkpoint       | Link | Note |
+|:------------|-------------|-------------|
+| pretrain-vicuna7b    | [link](https://huggingface.co/DAMO-NLP-SG/Video-LLaMA-Series/resolve/main/pretrain_vicuna7b_audiobranch.pth)       | Pre-trained on WebVid (2.5M video-caption pairs) and LLaVA-CC3M (595k image-caption pairs) |
+| finetune-vicuna7b-v2 | [link](https://huggingface.co/DAMO-NLP-SG/Video-LLaMA-Series/resolve/main/finetune_vicuna7b_audiobranch.pth) | Fine-tuned on the instruction-tuning data from [MiniGPT-4](https://github.com/Vision-CAIR/MiniGPT-4), [LLaVA](https://github.com/haotian-liu/LLaVA) and [VideoChat](https://github.com/OpenGVLab/Ask-Anything)|
+## Usage
+For launching the pre-trained Video-LLaMA on your own machine, please refer to our [github repo](https://github.com/DAMO-NLP-SG/Video-LLaMA).

finetune-billa7b-zh.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:91f1047d8e1d6970680db961ab9057fdf78919069cfc4c164e08023b66ff6e5d
+size 265435817

finetune-vicuna13b-v2.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2ebf848c8affaaa00194ffd6d3e1f5148ebd64bff08050fc12523a28d0023285
+size 274898177

finetune-vicuna7b-v2.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0680ad8eb14c2a3273b7be71309ab6b06c9f426e87ad4675a903371fe0fa8162
+size 265436777

finetune-ziya13b-zh.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a773de8e84dec9d980d4f040b522d0dc9d600161bc8ebe13ebb149bf1dfa3fc2
+size 274897409

finetune_vicuna7b_audiobranch.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:72877c69ae31ea436507af14ac9f1f5275feed98955e2271f4e79294b994c404
+size 274578593

pretrain-billa7b-zh.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f50a51db3055e1be6461f6dec833fbbbba28650287d26c8787664c8ee31dcf0f
+size 265435689

pretrain-vicuna13b.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6bc8fafd174e08e076b0b46b02330376a4813bf61d230eaea46a8e919721931c
+size 274897345

pretrain-ziya13b-zh.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2db583659e4b6d9bfb24f765f077c9ae3c0810618d2cf769b21bdde92e7c9d24
+size 274897281

pretrain_vicuna7b-v2.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ab4d69838d4281eb62d0da8a26c15cbd4e46f9e6168fb89919199da9899de089
+size 265435753

pretrain_vicuna7b_audiobranch.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:85cf6cf68906042f107928ffa635ed539ed104ae1fecacd22bb488ce80131e5a
+size 274577569