attention mask for mamba-370m-hf model

#3
by atharvganla - opened

i am creating a mamba inference pipeline by fetching the state-spaces mamba models (130m and 370m variants).
the mamba model is loaded using the following code snippet :
either - llm = MambaForCausalLM.from_pretrained(model_name)
or - llm = AutoModelForCausalLM.from_pretrained(model_name)

for both these methods, when i try to generate outputs, i receive a warning stating :
"The attention mask is not set and cannot be inferred from input because pad token is same as eos token.As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results."

i created the attn_mask for the input tokens :
tokens = llm_model.tokenizer(prompt, return_tensors="pt")
attn_mask = tokens.attention_mask.to(device=device)

and now i am trying to pass it to the model object as a parameter in the generate() function as follows :
llm.generate( input_ids=input_ids, max_new_tokens=50, eos_token_id=llm_model.tokenizer.eos_token_id, attention_mask=attn_mask)

but i receive this error :
ValueError: The following model_kwargs are not used by the model: ['attention_mask']

is there something that i am missing out on? how can i solve this out?

I got the same error too and think it was the result of the new update to the transformers library.
I set it to install version 4.41.0 of Transformers (instead of the most recent) and was able to solve it.

At the time of creation, I also used the newer versions for mamba & causal-conv1d, which didn't seem to be an issue.
mamba-ssm=2.2.2 and causal-conv1d=1.4.0

Hope this helps!

Sign up or log in to comment