When working with an LLM API, there are a few parameters that are crucial in order to get the best results for your use case.
Temperature
The temperature increases or decreases the weights of the other possible tokens. The lower the value, the more deterministic the results because the highest probably next token will always get picked.
The temperature parameter controls the randomness of the generated text. In short, the lower the temperature
, the more deterministic the generated text will be, while the higher the temperature
, the more creative the generated text will be.
For QA tasks, a lower temperature
is recommended, while for creative tasks a higher temperature
is recommended.
Top P
Nucleus sampling or top-p sampling allows you to to control if only the tokens comprising the top_p
probability mass should be considered for the responses.
For factual awnsers, a lower top_p
is recommended, while for creative tasks a higher top_p
is recommended.
It's recommended to alter temperature
or top_p
, but not both.
Max Length
Defines the maximum number of tokens the model generates. This can help you prevent long responses and control costs.
Stop Sequences
Another way to control the length and structure of results.
For example, you can tell the model to generate lists that do not exceed a certain number of elements.
Frequency Penalty
The frequency penalty increases or decreases the probability of generating tokens that are already in the response or prompt.
Presence Penalty
Similar to the frequency penalty, but the penalty is the same for all repeated tokens. A token that appears twice and a token that appears 10 times are penalized the same.
It's recommended to alter frequency_penalty
or presence_penalty
, but not both.