---
license: mit
library_name: peft
base_model: meta-llama/Meta-Llama-3-8B-Instruct
datasets:
- chenjoya/videollm-online-chat-ego4d-134k
language:
- en
tags:
- llama
- llama-3
- multimodal
- llm
- video stream
- online video understanding
- video understanding
pipeline_tag: video-text-to-text
---

# Model Card for Model ID

https://showlab.github.io/videollm-online/

## Model Details

* LLM: meta-llama/Meta-Llama-3-8B-Instruct
* Vision Strategy:
    * Frame Encoder: google/siglip-large-patch16-384
    * Frame Tokens: CLS Token + Avg Pooled 3x3 Tokens
    * Frame FPS: 2 for training, 2~10 for inference
    * Frame Resolution: max resolution 384, with zero-padding to keep aspect ratio
    * Video Length: 10 minutes
* Training Data: Ego4D Narration Stream 113K + Ego4D GoalStep Stream 21K

### Model Sources

<!-- Provide the basic links for the model. -->

- **Repository:** https://github.com/showlab/videollm-online
- **Paper:** https://arxiv.org/abs/2406.11816

## Uses

- First, clone the github repository and follow the installation instruction:

```sh
git clone https://github.com/showlab/videollm-online
```

Ensure you have Miniconda and Python version >= 3.10 installed, then run:
```sh
conda install -y pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
pip install transformers accelerate deepspeed peft editdistance Levenshtein tensorboard gradio moviepy submitit
pip install flash-attn --no-build-isolation
```

PyTorch source will make ffmpeg installed, but it is an old version and usually make very low quality preprocessing. Please install newest ffmpeg following:
```sh
wget https://johnvansickle.com/ffmpeg/releases/ffmpeg-release-amd64-static.tar.xz
tar xvf ffmpeg-release-amd64-static.tar.xz
rm ffmpeg-release-amd64-static.tar.xz
mv ffmpeg-7.0.1-amd64-static ffmpeg
```

If you want to try our model with the audio in real-time streaming, please also clone ChatTTS.

```sh
pip install omegaconf vocos vector_quantize_pytorch cython
git clone git+https://github.com/2noise/ChatTTS
mv ChatTTS demo/rendering/
```

- Launch the gradio demo locally with:
```sh
python -m demo.app --resume_from_checkpoint chenjoya/videollm-online-8b-v1plus
```

- Or launch the CLI locally with:
```sh
python -m demo.cli --resume_from_checkpoint chenjoya/videollm-online-8b-v1plus
```

## Citation 

```
@inproceedings{videollm-online,
  author       = {Joya Chen and Zhaoyang Lv and Shiwei Wu and Kevin Qinghong Lin and Chenan Song and Difei Gao and Jia-Wei Liu and Ziteng Gao and Dongxing Mao and Mike Zheng Shou},
  title        = {VideoLLM-online: Online Video Large Language Model for Streaming Video},
  booktitle    = {CVPR},
  year         = {2024},
}
```