I am not aware of any architecture that uses different max_sequence_length and max_position_embeddings. I think this is a typo.

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment