Fix `model_max_length` in `tokenizer_config.json`

#7
by bryant1410 - opened

The current value of model_max_length in tokenizer_config.json(basically infinity) is inconsistent with max_position_embeddings in config.json. It's also inconsistent with that of bge-base-en.

This also happens with bge-small-en, but I was thinking that it was good to have any potential discussion here first, before sending a PR for that one as well.

Beijing Academy of Artificial Intelligence org

Thanks!
The tokenizer_config.config is generated by huggingface transformers package automatically. To avoid confusion for users, It's better to fix this.

Shitao changed pull request status to merged

Yeah, not only for confusion, but I also forgot to mention that the tokenization wouldn't respect the max length otherwise.

Sign up or log in to comment