Edit model card

Converted intfloat/multilingual-e5-small model in onnx format for use with Vespa Embedding.

  • intfloat-multilingual-e5-small.onnx
  • intfloat-multilingual-e5-small_fp16.onnx
  • intfloat-multilingual-e5-small_quantized.onnx
    • (int8 quantize, In python, running it produces a different result...)

python can also output the same vectors as vespa's embeddings.

Note: normalize must be set to true in vespa's service.xml in order for embeddings output to be the same as python.

<component id="me5_small_q" type="hugging-face-embedder">
    <transformer-model path="me5/intfloat-multilingual-e5-small_quantized.onnx" />
    <tokenizer-model path="me5/tokenizer.json" />
    <normalize>true</normalize>
    <pooling-strategy>mean</pooling-strategy>
</component>

<component id="me5_small" type="hugging-face-embedder">
    <transformer-model path="me5/intfloat-multilingual-e5-small.onnx" />
    <tokenizer-model path="me5/tokenizer.json" />
    <normalize>true</normalize>
    <pooling-strategy>mean</pooling-strategy>
</component>

or url

        <component id="me5_small_fp16" type="hugging-face-embedder">
            <transformer-model
                url="https://huggingface.co/hotchpotch/vespa-onnx-intfloat-multilingual-e5-small/resolve/main/intfloat-multilingual-e5-small_fp16.onnx" />
            <tokenizer-model
                url="https://huggingface.co/hotchpotch/vespa-onnx-intfloat-multilingual-e5-small/resolve/main/tokenizer.json" />
            <normalize>true</normalize>
            <pooling-strategy>mean</pooling-strategy>
        </component>
from optimum.onnxruntime import ORTModelForSequenceClassification
from transformers import AutoTokenizer
from torch import Tensor
import torch
import torch.nn.functional as F

model_name = "hotchpotch/vespa-onnx-intfloat-multilingual-e5-small"
onnx_file_name = "intfloat-multilingual-e5-small.onnx"

model = ORTModelForSequenceClassification.from_pretrained(
    model_name, file_name=onnx_file_name
)
# override for last_hidden_states
model.output_names["logits"] = 0
tokenizer = AutoTokenizer.from_pretrained(model_name)


def average_pool(last_hidden_state: Tensor, attention_mask: Tensor) -> Tensor:
    last_hidden = last_hidden_state.masked_fill(~attention_mask[..., None].bool(), 0.0)
    return last_hidden.sum(dim=1) / attention_mask.sum(dim=1)[..., None]


input_texts = [
    "query: What is the capital of Japan?",
    "query: 日本の首都は?",  # "What is the capital of Japan?" in Japanese
    "passage: ニューヨークは大きな都市です年エネ年エネ",  # "New York is a big city" in Japanese
    "passage: 東京は良い場所です",  # "Tokyo is a good place" in Japanese, Tokyo is the capital of Japan.
]

batch_dict = tokenizer(
    input_texts, max_length=512, padding=True, truncation=True, return_tensors="pt"
)

if "token_type_ids" not in batch_dict:
    batch_dict["token_type_ids"] = torch.zeros_like(batch_dict["input_ids"])

# logits is last_hidden_state
last_hidden_states = model(**batch_dict).logits
embeddings = average_pool(last_hidden_states, batch_dict["attention_mask"])

# same vespa embeddings
embeddings = F.normalize(embeddings, p=2, dim=1)

# similarity score
print(embeddings[:2] @ embeddings[2:].T)

License

same e5 (MIT)

Attribution

All credits for this model go to the authors of Multilingual-E5-large and the associated researchers and organizations. When using this model, please be sure to attribute the original authors.

Downloads last month
18
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.