a bug

#6
by zhangbo2008 - opened

Could not find image processor class in the image processor config or the model config. Loading based on pattern matching with the model's feature extractor configuration.
Some weights of VisionEncoderDecoderModel were not initialized from the model checkpoint at /mnt/e/trocr-base_printed and are newly initialized: ['encoder.pooler.dense.bias', 'encoder.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

Hello, I Have faced the same problem, did you find a solution to this problem ??

from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image
import requests

load image from the IAM database (actually this model is meant to be used on printed text)

url = "https://eprocure.gov.in/cppp/image-captcha-generate/277578502/1720353899"
image = Image.open(requests.get(url, stream=True).raw).convert("RGB")

Initialize processor and model

processor = TrOCRProcessor.from_pretrained("microsoft/trocr-base-printed")
model = VisionEncoderDecoderModel.from_pretrained("microsoft/trocr-base-printed")

Preprocess the image

pixel_values = processor(images=image, return_tensors="pt").pixel_values

Generate text from the image

generated_ids = model.generate(pixel_values, max_new_tokens=50) # Adjust max_new_tokens as needed
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]

print("Generated Text:", generated_text)

use above code, it works.

Sign up or log in to comment