Creates a response.
The model to use for this request. We follow the HuggingFace name style, e.g. 'deepseek-ai/DeepSeek-V3.2'.
Context to provide to the model for the scope of this request. May either be a string or an array of input items. If a string is provided, it is interpreted as a user message.
10485760reasoning.encrypted_content, message.output_text.logprobs A list of tools that the model may call while generating the response.
Controls which tool the model should use, if any.
Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format, and querying for objects via API or the dashboard. Keys are strings with a maximum length of 64 characters. Values are strings with a maximum length of 512 characters.
Configuration options for text output.
Sampling temperature to use, between 0 and 2. Higher values make the output more random.
Nucleus sampling parameter, between 0 and 1. The model considers only the tokens with the top cumulative probability.
Penalizes new tokens based on whether they appear in the text so far.
Penalizes new tokens based on their frequency in the text so far.
Whether the model may call multiple tools in parallel.
Whether to stream response events as server-sent events.
Options that control streamed response behavior.
Whether to run the request in the background and return immediately.
The maximum number of tokens the model may generate for this response.
x >= 16The maximum number of tool calls the model may make while generating the response.
x >= 1Configuration options for reasoning behavior.
A stable identifier used for safety monitoring and abuse detection.
64A key to use when reading from or writing to the prompt cache.
64Controls how the service truncates the input when it exceeds the model context window.
auto, disabled Additional instructions to guide the model for this request.
Whether to store the response so it can be retrieved later.
The service tier to use for this request.
auto, default, flex, priority The number of most likely tokens to return at each position, along with their log probabilities.
0 <= x <= 20Success
The complete response object that was returned by the Responses API.
The unique ID of the response that was created.
The object type, which was always response.
response The Unix timestamp (in seconds) for when the response was created.
The Unix timestamp (in seconds) for when the response was completed, if it was completed.
The status that was set for the response.
Details about why the response was incomplete, if applicable.
The model that generated this response.
The ID of the previous response in the chain that was referenced, if any.
Additional instructions that were used to guide the model for this response.
The output items that were generated by the model.
An item representing a message, tool call, tool output, reasoning, or other response element.
The error that occurred, if the response failed.
The tools that were available to the model during response generation.
How the input was truncated by the service when it exceeded the model context window.
auto, disabled Whether the model was allowed to call multiple tools in parallel.
Configuration options for text output that were used.
The nucleus sampling parameter that was used for this response.
The presence penalty that was used to penalize new tokens based on whether they appear in the text so far.
The frequency penalty that was used to penalize new tokens based on their frequency in the text so far.
The number of most likely tokens that were returned at each position, along with their log probabilities.
The sampling temperature that was used for this response.
Reasoning configuration and outputs that were produced for this response.
Token usage statistics that were recorded for the response, if available.
The maximum number of tokens the model was allowed to generate for this response.
The maximum number of tool calls the model was allowed to make while generating the response.
Whether this response was stored so it can be retrieved later.
Whether this request was run in the background.
The service tier that was used for this response.
Developer-defined metadata that was associated with the response.
A stable identifier that was used for safety monitoring and abuse detection.
A key that was used to read from or write to the prompt cache.