Can I merge models using architectures other than Llama, such as Memba?

#10
by JL-er - opened

Is it effective to merge models of different architectures using mergekit?

AFAIK mergekit only supports transformer models, as it imports from HF transformers and manually specifies the different layer names in its architecture.py. You can try inspecting all the module names for mamba, and map them appropriately in mergekit, and have it load from the mamba_ssm package if it encounters a mamba model. I grabbed the module names yesterday:

MAMBA_INFO = StaticTensorNames(
    name="MambaLMHeadModel",
    pre_weight_names=["backbone.embedding.weight"],
    post_weight_names=["backbone.norm_f.weight", "lm_head.weight"],
    embed_weight_names=["backbone.embedding.weight", "lm_head.weight"],
    layer_prefix_format="backbone.layers.{idx}",
    layer_weight_suffixes=[
        "mixer.A_log",
        "mixer.D",
        "mixer.in_proj.weight",
        "conv1d.weight",
        "conv1d.bias",
        "x_proj.weight",
        "dt_proj.weight",
        "dt_proj.bias",
        "out_proj.weight",
        "norm.weight",
    ],
)

Sign up or log in to comment