What is exactly downcycling?

#1
by appvoid - opened

I'm asking because it seems similar to the approach I used using mergekit's passthrough method. Are you just slicing layers from a language model or are you doing more than that?

You can learn more about it here:
https://youtube.com/playlist?list=PLDn_JsyofyfTH5_5V1MNb8UYKxMl6IMNy&si=VRuzlso0dPVAny6Q

At a high-level, you are taking the weights of the first N layers of a reference model with M layers.

For instance, llama-3-8B has a total of 32 layers out which llama-3-6B took 24.

prince-canuma changed discussion status to closed

Sign up or log in to comment