Skip to main content

venice_ai.resources.responses

Venice AI Responses API resource (Alpha).

Wraps POST /responses — the OpenAI-compatible Responses API endpoint described at swagger.yaml.md:7244. The endpoint is currently tagged Alpha by Venice; request and response shapes may change without notice.

Unlike /chat/completions, this endpoint returns a typed output array containing reasoning, message, function_call, and web_search_call blocks. It is stateless — each request is independent and no conversation state is persisted between calls. E2EE-capable models are not supported; use /chat/completions with E2EE headers instead.

Streaming is supported via Server-Sent Events when stream=True; the returned :class:~venice_ai.streaming.Stream yields :class:~venice_ai.types.api.responses.ResponsesStreamEvent chunks.

Responses Objects

class Responses(APIResource["VeniceClient"])

Access the Venice Responses API (Alpha).

Access via :attr:VeniceClient.responses.

Responses.create

async def create(
*,
model: str,
input: str | list[dict[str, Any]],
include: list[str] | None = None,
max_output_tokens: int | None = None,
temperature: float | None = None,
top_p: float | None = None,
fallbacks: list[dict[str, str]] | None = None,
reasoning: Any | None = None,
tools: list[Tool | dict[str, Any]] | None = None,
tool_choice: str | dict[str, Any] | None = None,
web_search: bool | None = None,
venice_parameters: Any | None = None,
stream: bool = False
) -> ResponsesResponse | AsyncIterable[ResponsesStreamEvent]

Create a response using the Responses API (Alpha).

Wraps POST /api/v1/responses. Each call is stateless - no conversation history is persisted between requests.

Arguments:

  • model - Model ID. E2EE-capable models are not supported; use /chat/completions with E2EE headers instead.
  • input - Prompt - either a plain string or a list of structured input items (messages, reasoning blocks, function calls, etc.) as documented in the OpenAI Responses API.
  • include - Additional response fields to include.
  • max_output_tokens - Maximum tokens to generate.
  • temperature - Sampling temperature (0-2).
  • top_p - Nucleus sampling (0-1).
  • fallbacks - Anthropic beta parameter for Claude Fable 5 server-side refusal fallback. Array of {"model": ...} objects (max 10). Forwarded only for direct Anthropic routes; ignored otherwise.
  • reasoning - Nested reasoning config (``{"effort": "...",
  • "summary" - "..."}orReasoningConfig``).
  • tools - Tool definitions. Function tools plus the Alpha tool types (web_search, x_search, code_interpreter, file_search, computer_use_preview) are supported.
  • tool_choice - "auto" | "none" | "required" or a
  • ```{"type"` - "function", "function": {"name": ...}}`` dict.
  • web_search - Enable web search for this request.
  • venice_parameters - Venice-specific request parameters.
  • stream - When True, returns an async iterator of :class:ResponsesStreamEvent chunks parsed from Server-Sent Events. Default False returns a single :class:ResponsesResponse.

Returns:

:class:ResponsesResponse with typed output blocks (reasoning, message, function_call, web_search_call), or an AsyncIterable[ResponsesStreamEvent] when stream=True.

Raises:

  • InvalidRequestError - If parameters fail server-side validation (e.g. malformed input, unsupported tool type, or an E2EE-capable model is supplied).
  • AuthenticationError - If the API key is missing or invalid.
  • PermissionDeniedError - If the account lacks access to the Responses API alpha or the requested model.
  • NotFoundError - If the model id is unknown.
  • RateLimitError - If account-level rate limits are exceeded.
  • APIError - For other HTTP-level failures.

Example:

.. code-block:: python

from venice_ai import VeniceClient

async with VeniceClient() as client: model = await client.models.resolve_chat() response = await client.responses.create( model=model, input="Summarize the Treaty of Versailles in two sentences.", max_output_tokens=200, ) for block in response.output: if block.type == "message": print(block.content)