Trying to inference in multiple GPU's raises tensor in diferent devices error, so I solved the problem by moving the image_features to the inputs_embeds.device in line 855, like this:
new_input_embeds.append(torch.cat(
(inputs_embeds[i, :boi_token_pos], images_features[i].to(inputs_embeds.device), inputs_embeds[i, eoi_token_pos + 1:])))

zRzRzRzRzRzRzR changed pull request status to merged

Sign up or log in to comment