<|begin_of_text|> is added twice by the preprocessor
#44
by
kz919
- opened
import requests
import torch
from PIL import Image
from transformers import MllamaForConditionalGeneration, AutoProcessor
model_id = "meta-llama/Llama-3.2-11B-Vision-Instruct"
model = MllamaForConditionalGeneration.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
processor = AutoProcessor.from_pretrained(model_id)
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/0052a70beed5bf71b92610a43a52df6d286cd5f3/diffusers/rabbit.jpg"
image = Image.open(requests.get(url, stream=True).raw)
messages = [
{"role": "user", "content": [
{"type": "image"},
{"type": "text", "text": "If I had to write a haiku for this one, it would be: "}
]}
]
input_text = processor.apply_chat_template(messages, add_generation_prompt=True)
inputs = processor(image, input_text, return_tensors="pt").to(model.device)
print(inputs.input_ids)
print(processor.tokenizer.decode(inputs.input_ids[0, :6]))
output = model.generate(**inputs, max_new_tokens=30)
print(processor.decode(output[0]))
tensor([[128000, 128000, 128006, 882, 128007, 271, 128256, 2746, 358,
1047, 311, 3350, 264, 6520, 39342, 369, 420, 832,
11, 433, 1053, 387, 25, 220, 128009, 128006, 78191,
128007, 271]], device='cuda:0')
'<|begin_of_text|><|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\n'
128000 is added twice when running the given example, does it impact the quality of the model?