Language model temperature: adjusting creativity and reliability
The principle of the temperature parameter
During text generation, language models produce a set of scores called logits for each token. These scores are transformed into probabilities using a softmax function. The temperature parameter is involved in this transformation: it divides or multiplies the logits before the function is applied, thus controlling the shape of the distribution. Mathematically, if (z_i) is the logit for token i, the probability becomes proportional to exp(z_i / T), where T is the temperature. A low temperature (< 0.5) accentuates deviations: the most probable tokens dominate, making the output more deterministic. A high temperature (> 1) flattens the distribution, allowing greater variety and creativity.
Impact of temperature on output
Temperature choices have a strong influence on the model’s personality:
- Very low temperature (0 – 0.3): the model almost always selects the most likely token. This option is ideal for tasks requiring precision or factual answers (technical translations, accurate summaries, problem solving). It reduces the risk of hallucination, but can make the answer monotonous.
- Moderate temperature (0.4 – 0.7): a balance between consistency and diversity. Useful for generating informative texts while retaining some stylistic flexibility. Suitable for classic dialogues and educational explanations.
- High temperature (0.8 – 1.2): outputs become more creative, surprising and sometimes offbeat. This parameter is used for brainstorming, creative writing or generating original ideas. On the other hand, it increases the likelihood of errors and contradictions.
- Very high temperature (1.3 and above): distribution is almost uniform. Responses can become incoherent, even absurd. Use to experiment with singular voices or artistic content.
The following table summarizes the recommendations according to task type:
| Type of task |
Main objective |
Recommended temperature |
| Technical translation |
Accuracy and precision |
0,1 – 0,3 |
| Article summary |
Clear and concise |
0,2 – 0,5 |
| Answer to a factual question |
Accuracy and consistency |
0 – 0,2 |
| Conversational dialogue |
Natural and light |
0,4 – 0,7 |
| Creative brainstorming |
Variety and originality |
0,7 – 1,1 |
| Literary or poetic writing |
Inventiveness and style |
0,8 – 1,5 |
Temperature and other parameters
Temperature doesn’t work alone. It acts in combination with other decoding parameters:
- Top-k sampling: limits generation to the k most probable tokens at each stage. Reduces the risk of selecting a low-probability token at high temperatures.
- Nucleus sampling (top-p): dynamically selects the smallest set of tokens whose sum of probabilities exceeds a threshold p (often 0.8 or 0.9). Allows a compromise between variety and quality.
- Frequency and presence penalty: penalizes tokens that have already appeared to encourage diversity and avoid repetition.
- Maximum and minimum length: guide the length of responses to keep them within a set range.
Tips for setting the right temperature
- Adapt it to the context: an automated customer service will prefer a low temperature to provide reliable answers. Conversely, a generator of new ideas can work with higher temperatures.
- Combine with top-p or top-k: to avoid unpleasant surprises, we recommend coupling moderate temperature with nucleus sampling. This eliminates tokens at the extremes, while retaining a certain degree of creativity.
- Test and iterate: the perception of creativity varies according to language and domain. Test several settings on the same text to compare nuances and adjust according to your needs.
- Watch out for hallucinations: too high a temperature increases the likelihood of the model producing incorrect or irrelevant information. In sensitive contexts (healthcare, finance), set strict limits and provide for human validation or consistency checks.
Conclusion
Temperature is a simple but powerful parameter for shaping LLM behavior. By playing with this cursor, you can go from a perfectly predictable generation to an exuberant creation. Used properly, it balances creativity and reliability, meeting a wide variety of needs, from legal assistance to poetry. The important thing is to adapt it to the context, to combine it with other settings and to remain attentive to the consequences on the coherence of responses.