Limited (truncated) response with inference API

#23
by RobertTaylor - opened

I am getting a limited output when using the inference API. This is also the case within the on-page example. Is HF's rate-limiting per-token?

I am getting a limited output when using the inference API. This is also the case within the on-page example. Is HF's rate-limiting per-token?

Hi there, did you set max_new_tokens ?

Gosh, thanks for that. Sorry, I'm an idiot.

what are the other parameters?

what are the other parameters?

Hi, Maybe this can help : https://huggingface.co/docs/api-inference/detailed_parameters#text-generation-task

Sign up or log in to comment