Is it valid to use CausalLM with zero attention values?

#178
by soumyasanyal08 - opened

Hi,

I'm trying to understand what happens when we call LlamaForCausalLM model with attention_mask = [1, 1, 1, 1, 0, 0, 0, 0]. For the 5th index (using 0-indexing), would it have teacher-forcing internally? For instance, what's the internal differences in using above attention_mask vs. attention_mask_2 = [1, 1, 1, 1, 1, 0, 0, 0]?

Does it make sense to call causal models with zeros in the attention mask?

Sign up or log in to comment