[Cache Request] meta-llama/Meta-Llama-3-70B-Instruct-8192tokens

#169
by harmeet03 - opened

I need to deploy llama3 model on infrentia2 machine that can take tokens upto 8192 length. The existing model has max_token length as 4096.
Can you please compile and upload model with:

 {
'task': 'text-generation', 
'batch_size': 4, 
'num_cores': 24, 
'auto_cast_type': 'fp16', 
'sequence_length': 8192, 
'compiler_type': 'neuronx-cc'
}

Any update on this?

Sign up or log in to comment