--- license: mit library_name: peft base_model: meta-llama/Meta-Llama-3-8B-Instruct datasets: - chenjoya/videollm-online-chat-ego4d-134k language: - en tags: - llama - llama-3 - multimodal - llm - video stream - online video understanding - video understanding pipeline_tag: video-text-to-text --- # Model Card for Model ID https://showlab.github.io/videollm-online/ ## Model Details * LLM: meta-llama/Meta-Llama-3-8B-Instruct * Vision Strategy: * Frame Encoder: google/siglip-large-patch16-384 * Frame Tokens: CLS Token + Avg Pooled 3x3 Tokens * Frame FPS: 2 for training, 2~10 for inference * Frame Resolution: max resolution 384, with zero-padding to keep aspect ratio * Video Length: 10 minutes * Training Data: Ego4D Narration Stream 113K + Ego4D GoalStep Stream 21K ### Model Sources - **Repository:** https://github.com/showlab/videollm-online - **Paper:** https://arxiv.org/abs/2406.11816 ## Uses - First, clone the github repository and follow the installation instruction: ```sh git clone https://github.com/showlab/videollm-online ``` Ensure you have Miniconda and Python version >= 3.10 installed, then run: ```sh conda install -y pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia pip install transformers accelerate deepspeed peft editdistance Levenshtein tensorboard gradio moviepy submitit pip install flash-attn --no-build-isolation ``` PyTorch source will make ffmpeg installed, but it is an old version and usually make very low quality preprocessing. Please install newest ffmpeg following: ```sh wget https://johnvansickle.com/ffmpeg/releases/ffmpeg-release-amd64-static.tar.xz tar xvf ffmpeg-release-amd64-static.tar.xz rm ffmpeg-release-amd64-static.tar.xz mv ffmpeg-7.0.1-amd64-static ffmpeg ``` If you want to try our model with the audio in real-time streaming, please also clone ChatTTS. ```sh pip install omegaconf vocos vector_quantize_pytorch cython git clone git+https://github.com/2noise/ChatTTS mv ChatTTS demo/rendering/ ``` - Launch the gradio demo locally with: ```sh python -m demo.app --resume_from_checkpoint chenjoya/videollm-online-8b-v1plus ``` - Or launch the CLI locally with: ```sh python -m demo.cli --resume_from_checkpoint chenjoya/videollm-online-8b-v1plus ``` ## Citation ``` @inproceedings{videollm-online, author = {Joya Chen and Zhaoyang Lv and Shiwei Wu and Kevin Qinghong Lin and Chenan Song and Difei Gao and Jia-Wei Liu and Ziteng Gao and Dongxing Mao and Mike Zheng Shou}, title = {VideoLLM-online: Online Video Large Language Model for Streaming Video}, booktitle = {CVPR}, year = {2024}, } ```