use in web browser

by ciekawy - opened Jul 15

Jul 15

I tried latest @xenova /transformers to use this bge-m3 and apparently it gets stuck on creating pipeline. Same works with nodejs.
I also tried using this onnx model directly with onnxruntime-web (just missing the tokenizer) and the actual was able to compute embeddings....

ciekawy

Jul 16

ok, I managed to run everything with transformers v3 branch and latest onnxruntime-web thanks to https://github.com/microsoft/onnxruntime/issues/20876

however I noticed now that that the wasm is up to 2x faster than webgpu on Apple M3 and enough RAM (with quantized model, measuring just single extractor calls)

Xenova

Owner Jul 16

however I noticed now that that the wasm is up to 2x faster than webgpu on Apple M3 and enough RAM (with quantized model, measuring just single extractor calls)

I would recommend setting the dtype to fp16 or q4: with await pipeline('feature-extraction', 'Xenova/bge-m3', { dtype: 'fp16' }) for example.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment