How much data does 32k have?

#5
by wnma3mz - opened

https://arxiv.org/abs/2407.10671

The article mentions pretrain tokens with 7T. However, it is trained with 4096 lengths first, and I am curious about how many of them are 32k lengths?

To enhance the long-context capability of Qwen2, we augmented the context length from 4,096 tokens to 32,768 tokens during the concluding phase of pre-training. This expansion was complemented by the introduction of a significantly increased volume of high-quality, lengthy data.

Sign up or log in to comment