google/gemma-2-9b-it · Possible issue with context window

I was making inferences with Gemma 2 9b instruct that I received the following error:

RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

I managed to solve it by setting max_length to 4096. I assume the issue should be with the layers that the context window is 4096 with sliding window.
transformers version: 4.42.3
PyTorch version: 2.3.1