Version Notice
(dmontagu/response-format preview) This documentation is ahead of the last release by 1 commit. You may see documentation for features not yet supported in the latest release v0.0.49 2025-04-01.
pydantic_ai.settings
ModelSettings
Bases: TypedDict
Settings to configure an LLM.
Here we include only settings which apply to multiple models / model providers, though not all of these settings are supported by all models.
Source code in pydantic_ai_slim/pydantic_ai/settings.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 |
|
max_tokens
instance-attribute
max_tokens: int
The maximum number of tokens to generate before stopping.
Supported by:
- Gemini
- Anthropic
- OpenAI
- Groq
- Cohere
- Mistral
temperature
instance-attribute
temperature: float
Amount of randomness injected into the response.
Use temperature
closer to 0.0
for analytical / multiple choice, and closer to a model's
maximum temperature
for creative and generative tasks.
Note that even with temperature
of 0.0
, the results will not be fully deterministic.
Supported by:
- Gemini
- Anthropic
- OpenAI
- Groq
- Cohere
- Mistral
top_p
instance-attribute
top_p: float
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass.
So 0.1 means only the tokens comprising the top 10% probability mass are considered.
You should either alter temperature
or top_p
, but not both.
Supported by:
- Gemini
- Anthropic
- OpenAI
- Groq
- Cohere
- Mistral
timeout
instance-attribute
timeout: float | Timeout
Override the client-level default timeout for a request, in seconds.
Supported by:
- Gemini
- Anthropic
- OpenAI
- Groq
- Mistral
parallel_tool_calls
instance-attribute
parallel_tool_calls: bool
Whether to allow parallel tool calls.
Supported by:
- OpenAI (some models, not o1)
- Groq
- Anthropic
seed
instance-attribute
seed: int
The random seed to use for the model, theoretically allowing for deterministic results.
Supported by:
- OpenAI
- Gemini
- Groq
- Cohere
- Mistral
presence_penalty
instance-attribute
presence_penalty: float
Penalize new tokens based on whether they have appeared in the text so far.
Supported by:
- OpenAI
- Groq
- Cohere
- Gemini
- Mistral
frequency_penalty
instance-attribute
frequency_penalty: float
Penalize new tokens based on their existing frequency in the text so far.
Supported by:
- OpenAI
- Groq
- Cohere
- Gemini
- Mistral
logit_bias
instance-attribute
Modify the likelihood of specified tokens appearing in the completion.
Supported by:
- OpenAI
- Groq
force_response_format
instance-attribute
force_response_format: bool
Whether to force a specific response format from the model.
TODO: Add a description of what this means and the pros/cons of using this. Pros: Works better than tool calling with many "dumber" models Cons: Forces the model to generate structured output, so the agent cannot make use of data retrieval tool calls before generating a final response.
TODO: Explain that this can be set on the model if you know you want to the agent in a way that doesn't require tool calls
Supported by:
- Cohere
- Gemini
- Groq
- OpenAI