Version Notice

(dmontagu/response-format preview) This documentation is ahead of the last release by 1 commit. You may see documentation for features not yet supported in the latest release v0.2.11 (2025-05-27).

`pydantic_ai.settings`

ModelSettings

Bases: TypedDict

Settings to configure an LLM.

Here we include only settings which apply to multiple models / model providers, though not all of these settings are supported by all models.

Source code in pydantic_ai_slim/pydantic_ai/settings.py

class ModelSettings(TypedDict, total=False):
    """Settings to configure an LLM.

    Here we include only settings which apply to multiple models / model providers,
    though not all of these settings are supported by all models.
    """

    max_tokens: int
    """The maximum number of tokens to generate before stopping.

    Supported by:

    * Gemini
    * Anthropic
    * OpenAI
    * Groq
    * Cohere
    * Mistral
    """

    temperature: float
    """Amount of randomness injected into the response.

    Use `temperature` closer to `0.0` for analytical / multiple choice, and closer to a model's
    maximum `temperature` for creative and generative tasks.

    Note that even with `temperature` of `0.0`, the results will not be fully deterministic.

    Supported by:

    * Gemini
    * Anthropic
    * OpenAI
    * Groq
    * Cohere
    * Mistral
    """

    top_p: float
    """An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass.

    So 0.1 means only the tokens comprising the top 10% probability mass are considered.

    You should either alter `temperature` or `top_p`, but not both.

    Supported by:

    * Gemini
    * Anthropic
    * OpenAI
    * Groq
    * Cohere
    * Mistral
    """

    timeout: float | Timeout
    """Override the client-level default timeout for a request, in seconds.

    Supported by:

    * Gemini
    * Anthropic
    * OpenAI
    * Groq
    * Mistral
    """

    parallel_tool_calls: bool
    """Whether to allow parallel tool calls.

    Supported by:

    * OpenAI (some models, not o1)
    * Groq
    * Anthropic
    """

    seed: int
    """The random seed to use for the model, theoretically allowing for deterministic results.

    Supported by:

    * OpenAI
    * Gemini
    * Groq
    * Cohere
    * Mistral
    """

    presence_penalty: float
    """Penalize new tokens based on whether they have appeared in the text so far.

    Supported by:

    * OpenAI
    * Groq
    * Cohere
    * Gemini
    * Mistral
    """

    frequency_penalty: float
    """Penalize new tokens based on their existing frequency in the text so far.

    Supported by:

    * OpenAI
    * Groq
    * Cohere
    * Gemini
    * Mistral
    """

    logit_bias: dict[str, int]
    """Modify the likelihood of specified tokens appearing in the completion.

    Supported by:

    * OpenAI
    * Groq
    """

    force_response_format: bool
    """Whether to force a specific response format from the model.

    TODO: Add a description of what this means and the pros/cons of using this.
    Pros: Works better than tool calling with many "dumber" models
    Cons: Forces the model to generate structured output, so the agent cannot make use of data retrieval tool calls
    before generating a final response.
    # TODO: Explain that this can be set on the model if you know you want to the agent in a way that doesn't require tool calls

    Supported by:

    * Cohere
    * Gemini
    * Groq
    * OpenAI
    """

max_tokens `instance-attribute`

max_tokens: int

The maximum number of tokens to generate before stopping.

Supported by:

Gemini
Anthropic
OpenAI
Groq
Cohere
Mistral

temperature `instance-attribute`

temperature: float

Amount of randomness injected into the response.

Use temperature closer to 0.0 for analytical / multiple choice, and closer to a model's maximum temperature for creative and generative tasks.

Note that even with temperature of 0.0, the results will not be fully deterministic.

Supported by:

Gemini
Anthropic
OpenAI
Groq
Cohere
Mistral

top_p `instance-attribute`

top_p: float

An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass.

So 0.1 means only the tokens comprising the top 10% probability mass are considered.

You should either alter temperature or top_p, but not both.

Supported by:

Gemini
Anthropic
OpenAI
Groq
Cohere
Mistral

timeout `instance-attribute`

timeout: float | Timeout

Override the client-level default timeout for a request, in seconds.

Supported by:

Gemini
Anthropic
OpenAI
Groq
Mistral

parallel_tool_calls `instance-attribute`

parallel_tool_calls: bool

Whether to allow parallel tool calls.

Supported by:

OpenAI (some models, not o1)
Groq
Anthropic

seed `instance-attribute`

seed: int

The random seed to use for the model, theoretically allowing for deterministic results.

Supported by:

OpenAI
Gemini
Groq
Cohere
Mistral

presence_penalty `instance-attribute`

presence_penalty: float

Penalize new tokens based on whether they have appeared in the text so far.

Supported by:

OpenAI
Groq
Cohere
Gemini
Mistral

frequency_penalty `instance-attribute`

frequency_penalty: float

Penalize new tokens based on their existing frequency in the text so far.

Supported by:

OpenAI
Groq
Cohere
Gemini
Mistral

logit_bias `instance-attribute`

logit_bias: dict[str, int]

Modify the likelihood of specified tokens appearing in the completion.

Supported by:

OpenAI
Groq

force_response_format `instance-attribute`

force_response_format: bool

Whether to force a specific response format from the model.

TODO: Add a description of what this means and the pros/cons of using this. Pros: Works better than tool calling with many "dumber" models Cons: Forces the model to generate structured output, so the agent cannot make use of data retrieval tool calls before generating a final response.

TODO: Explain that this can be set on the model if you know you want to the agent in a way that doesn't require tool calls

Supported by:

Cohere
Gemini
Groq
OpenAI

pydantic_ai.settings

ModelSettings

max_tokens instance-attribute

temperature instance-attribute

top_p instance-attribute

timeout instance-attribute

parallel_tool_calls instance-attribute

seed instance-attribute

presence_penalty instance-attribute

frequency_penalty instance-attribute

logit_bias instance-attribute

force_response_format instance-attribute

TODO: Explain that this can be set on the model if you know you want to the agent in a way that doesn't require tool calls

`pydantic_ai.settings`

max_tokens `instance-attribute`

temperature `instance-attribute`

top_p `instance-attribute`

timeout `instance-attribute`

parallel_tool_calls `instance-attribute`

seed `instance-attribute`

presence_penalty `instance-attribute`

frequency_penalty `instance-attribute`

logit_bias `instance-attribute`

force_response_format `instance-attribute`