[Cache Request] meta-llama/Meta-Llama-3-70B-Instruct-8192tokens

#169

by harmeet03 - opened Aug 13

Aug 13

I need to deploy llama3 model on infrentia2 machine that can take tokens upto 8192 length. The existing model has max_token length as 4096.
Can you please compile and upload model with:

 {
'task': 'text-generation', 
'batch_size': 4, 
'num_cores': 24, 
'auto_cast_type': 'fp16', 
'sequence_length': 8192, 
'compiler_type': 'neuronx-cc'
}

harmeet03

17 days ago

Any update on this?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment