Fix `model_max_length` in `tokenizer_config.json`

by bryant1410 - opened Aug 10, 2023

base: refs/heads/main

←

from: refs/pr/7

Discussion Files changed

-1

bryant1410

Aug 10, 2023

The current value of model_max_length in tokenizer_config.json(basically infinity) is inconsistent with max_position_embeddings in config.json. It's also inconsistent with that of bge-base-en.

This also happens with bge-small-en, but I was thinking that it was good to have any potential discussion here first, before sending a PR for that one as well.

Fix `model_max_length` in `tokenizer_config.json`a5a6b707

Shitao

Beijing Academy of Artificial Intelligence org Aug 11, 2023

Thanks!
The tokenizer_config.config is generated by huggingface transformers package automatically. To avoid confusion for users, It's better to fix this.

Shitao changed pull request status to merged Aug 11, 2023

bryant1410

Aug 11, 2023

Yeah, not only for confusion, but I also forgot to mention that the tokenization wouldn't respect the max length otherwise.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment