Xenova/nanoLLaVA · ONNX Conversion Tutorial

May 19

Hi @Xenova , thank you for your awesome work.
I recently fine-tuned this model for information extraction from images using JSON Schema, with the intention of embedding it into a web application. I was wondering if you could recommend any existing tutorials that would guide me through the process of converting the model into the ONNX format. This would enable me to perform the conversion independently in the future. Thank you for your awesome work!

Xenova

Owner May 19

I must admit, the current process to export the model is a bit complicated, and is very manual/hacky at the moment... I'll eventually turn it into a script, but in the meantime, just ping me and I'd be happy to help out with it.

Xenova

Owner May 19

There are 3 components:

Vision model + multimodal projection (vision_encoder.onnx)
Embedding layer (embed_tokens.onnx)
Language model without embedding layer (decoder_model_merged.onnx)

jasonwang110

11 days ago

Hi @Xenova ,
Thank for your sharing. Based on your above the tip, The model of step1 and step2 have been converted into onnx successfully. There is one question about Language Model. When I applied the code of "python convert.py --model_id ./language-model --tokenizer_id ./tokenizer --task text-generation-with-past --skip_validation --trust_remote_code"。The the names and dimension(batch_size, sequence_length) of input name（input_ids. The correct one should be inputs_embeds with dimension(batch_size, sequence_length, 1024). Could you give some suggestion how to solve this problem?
Best Regards