How can I obtain token-level text features?

#13
by Huangyanhao - opened

How can I obtain token-level text features?

Jina AI org

@Huangyanhao quick hack a bit (we didn't offer an easy interface:

import torch
from transformers import AutoModel, AutoTokenizer


model = AutoModel.from_pretrained('jinaai/jina-clip-v1', trust_remote_code=True)
text_tower = model.text_model.transformer

print(text_tower)

tokenizer = AutoTokenizer.from_pretrained('jinaai/jina-clip-v1')

encoded_text = tokenizer(["it's a nice weather today"], padding=True, truncation=True, return_tensors="pt")

print(encoded_text)


with torch.no_grad():
    token_embeddings = text_tower(**encoded_text).last_hidden_state

print(token_embeddings.shape)
print(token_embeddings)

Thank you for your timely assistance; your response has effectively addressed my concern.

Huangyanhao changed discussion status to closed

Sign up or log in to comment