Question about model architecture

by sh0416 - opened


I'm just wondering that the architecture is different from starcoder.

Starcoder uses GPTBigCode, while this use custom architecture.

If it differs, could you elaborate details?


BigCode org

AFAIK Santa Coder was an early experiment. Please use the starcoder series models.

Sign up or log in to comment