use in web browser

#3
by ciekawy - opened

I tried latest @xenova /transformers to use this bge-m3 and apparently it gets stuck on creating pipeline. Same works with nodejs.
I also tried using this onnx model directly with onnxruntime-web (just missing the tokenizer) and the actual was able to compute embeddings....

ok, I managed to run everything with transformers v3 branch and latest onnxruntime-web thanks to https://github.com/microsoft/onnxruntime/issues/20876

however I noticed now that that the wasm is up to 2x faster than webgpu on Apple M3 and enough RAM (with quantized model, measuring just single extractor calls)

Owner

however I noticed now that that the wasm is up to 2x faster than webgpu on Apple M3 and enough RAM (with quantized model, measuring just single extractor calls)

I would recommend setting the dtype to fp16 or q4: with await pipeline('feature-extraction', 'Xenova/bge-m3', { dtype: 'fp16' }) for example.

Sign up or log in to comment