attention mask for mamba-370m-hf model
1
#3 opened 7 days ago
by
atharvganla
[AUTOMATED] Model Memory Requirements
#2 opened about 2 months ago
by
muellerzr
Can `MambaForCausalLM` be used directly for training instead of `AutoModelForCausalLM`?
#1 opened 3 months ago
by
TimeSpeaker