venice_ai.middleware.retry

Advanced retry middleware for aiohttp with intelligent exponential backoff and jitter.

This module provides a sophisticated retry mechanism specifically designed for the Venice AI client's HTTP communication layer. It integrates seamlessly with aiohttp's middleware system to handle transient failures, rate limiting, and network issues with intelligent retry strategies.

Key Features

Exponential Backoff: Implements exponential backoff with configurable base delay and multiplier
Jitter Support: Adds randomization to prevent thundering herd problems
Smart Retry Logic: Differentiates between idempotent and non-idempotent HTTP methods
Rate Limit Awareness: Respects Retry-After headers from API responses
Configurable Exception Handling: Customizable set of exceptions that trigger retries
Comprehensive Logging: Detailed logging for monitoring and debugging retry behavior

Integration with Venice AI Client

The retry middleware is automatically integrated into the Venice AI client's HTTP session during initialization. It operates transparently at the transport layer, intercepting failed requests and applying retry logic before propagating failures to higher-level code.

Retry Strategy

The module implements a sophisticated retry strategy that considers:

HTTP Method Safety: Only retries idempotent methods (GET, PUT, DELETE, etc.) by default
Status Code Analysis: Retries on specific HTTP status codes (5xx errors by default)
Exception Type Filtering: Retries on network timeouts and connection errors
Exponential Backoff: Increases delay between attempts to reduce server load
Jitter Application: Adds randomness to prevent synchronized retry storms

Performance Considerations

The retry mechanism is designed to be efficient and respectful of server resources:

Caps maximum delay to prevent excessive wait times
Uses jitter to distribute retry attempts across time
Respects server-provided Retry-After headers
Logs retry attempts for monitoring and debugging

RetryOptions Objects

@dataclass
class RetryOptions()

Comprehensive configuration for retry behavior and exponential backoff strategies.

This class provides fine-grained control over how the retry middleware handles failed requests, including timing strategies, condition filtering, and monitoring hooks.

Attributes:

max_attempts - Maximum number of retry attempts (excluding the initial request). For example, max_attempts=3 means up to 4 total attempts (1 initial + 3 retries). Higher values increase resilience but may delay error propagation.
retry_status_codes - Set of HTTP status codes that should trigger a retry attempt. Default includes server errors (5xx). Rate limiting (429) is intentionally excluded because SimpleRateLimiter handles 429 retries with per-model state tracking, exponential backoff, and Retry-After header support. Common additions might include 408 (Request Timeout) or 413 (Payload Too Large) depending on use case.
retry_exceptions - List of exception types that should trigger retry attempts. Focuses on transient network issues and timeouts that are likely to resolve on subsequent attempts. Does not include programming errors or authentication failures.
base_delay - Base delay in seconds for exponential backoff calculation. This is the starting delay for the first retry attempt. Subsequent attempts use exponential_base^attempt * base_delay. Lower values provide faster retries but may overwhelm struggling servers.
max_delay - Maximum delay in seconds to cap exponential growth. Prevents exponential backoff from creating excessively long delays. Helps maintain reasonable response times even after multiple failures.
exponential_base - Base multiplier for exponential backoff calculation. Determines how quickly delays increase. 2.0 doubles delay each attempt, while 1.5 provides more gradual increases. Higher values back off more aggressively.
jitter_factor - Randomization factor (0.0 to 1.0) to prevent thundering herd problems. Adds random variation to delays to prevent multiple clients from retrying simultaneously. 0.1 means ±10% random variation. Higher values increase randomization but may make retry timing less predictable.
respect_retry_after - Whether to honor Retry-After headers from server responses. When True, server-provided retry delays override calculated exponential backoff. Recommended for APIs that provide intelligent rate limiting guidance.
max_retry_after - Maximum seconds to wait for server-provided Retry-After values. Prevents malicious or misconfigured servers from forcing excessive delays. Acts as a safety cap on server-directed retry timing.
idempotent_methods - Set of HTTP methods considered safe to retry automatically. These methods should not have side effects when repeated. POST is notably excluded by default since it typically creates or modifies resources.
retry_non_idempotent - Whether to retry non-idempotent methods like POST. Default is True because Venice API endpoints (chat completions, embeddings, image generation) are effectively idempotent and safe to retry on transient errors. Set to False if your use case involves non-idempotent operations.
on_retry - Optional callback function for monitoring or logging retry attempts. Called with (attempt_number, delay_seconds, exception_or_none) for each retry. Useful for metrics collection, alerting, or debugging retry behavior.

RetryOptions.jitter_factor

jitter_factor = 0.1

Using 0.1 for backward compatibility with existing configs

calculate_backoff_delay

def calculate_backoff_delay(attempt: int, base_delay: float,
                            exponential_base: float, max_delay: float,
                            jitter_factor: float) -> float

Calculate intelligent retry delay using exponential backoff with jitter.

This function implements a sophisticated backoff strategy that balances quick recovery from transient issues with respectful behavior toward struggling servers. The algorithm combines exponential backoff (to reduce load on failing services) with jitter (to prevent thundering herd problems when multiple clients retry simultaneously).

The calculation process:

Compute exponential delay: base_delay * (exponential_base ^ attempt)
Cap the result at max_delay to prevent excessive waits
Apply jitter as random variation: ±(jitter_factor * delay)
Ensure the final delay is never negative

Arguments:

attempt - The current attempt number (0-based indexing). attempt=0 for first retry, attempt=1 for second retry, etc.
base_delay - Base delay in seconds for the exponential calculation. This is the delay used for the first retry attempt before exponential growth.
exponential_base - Multiplicative base for exponential backoff. Common values are 2.0 (doubling) or 1.5 (50% increase per attempt).
max_delay - Maximum delay in seconds to cap exponential growth. Prevents extremely long delays that could impact user experience.
jitter_factor - Randomization factor between 0.0 and 1.0. 0.0 = no randomness, 1.0 = up to 100% variation in either direction.

Returns:

Calculated delay in seconds before the next retry attempt. Always returns a non-negative value, even with maximum jitter applied.

Example:

>>> calculate_backoff_delay(0, 1.0, 2.0, 60.0, 0.1)
# First retry: ~1.0 seconds ± 10% jitter
>>> calculate_backoff_delay(2, 1.0, 2.0, 60.0, 0.1)
# Third retry: ~4.0 seconds ± 10% jitter

parse_retry_after_header

def parse_retry_after_header(response: ClientResponse) -> float | None

Parse the Retry-After header from an HTTP response to determine server-suggested delay.

The Retry-After header is commonly used by APIs to indicate when a client should retry a request, particularly for rate limiting (429) and temporary service unavailability (503) responses. This function handles both formats specified in RFC 7231.

The header can contain either:

An integer number of seconds to wait (e.g., "Retry-After: 120")
An HTTP-date timestamp indicating when to retry (e.g., "Retry-After: Wed, 21 Oct 2015 07:28:00 GMT")

Arguments:

response - The aiohttp ClientResponse object containing the HTTP headers. Must be a valid response object with accessible headers.

Returns:

Number of seconds to wait before retrying, or None if:

The Retry-After header is not present in the response
The header value cannot be parsed as a valid delay
The header contains an unrecognized date format

Notes:

HTTP-date parsing uses email.utils.parsedate_to_datetime() which follows RFC 2822 and RFC 5322 date formats commonly used in HTTP headers.

create_retry_middleware

def create_retry_middleware(options: RetryOptions | None = None) -> Middleware

Create an intelligent aiohttp middleware that implements advanced retry logic.

This function returns a middleware component that integrates into aiohttp's request pipeline to automatically handle transient failures with sophisticated retry strategies. The middleware operates transparently, intercepting failed requests and applying configurable retry logic before either succeeding or propagating the final failure.

The middleware implements several layers of intelligence:

Request Analysis: Determines whether a request is safe to retry based on:

HTTP method idempotency (GET, PUT, DELETE are safe; POST typically isn't)
Configuration settings for non-idempotent method handling

Failure Detection: Identifies retryable failures through:

HTTP status code analysis (rate limiting, server errors)
Exception type filtering (timeouts, network errors)
Exclusion of permanent failures (authentication, client errors)

Retry Strategy: Applies intelligent backoff using:

Exponential backoff to reduce load on struggling servers
Jitter to prevent thundering herd problems
Server-provided Retry-After header respect
Configurable maximum delays and attempt limits

Monitoring Integration: Provides visibility through:

Detailed logging of retry attempts and decisions
Optional callback hooks for metrics collection
Exception preservation for proper error propagation

Arguments:

options - Optional RetryOptions instance for customizing retry behavior. If None, uses default retry configuration suitable for most API interactions. Default behavior retries up to 3 times with exponential backoff starting at 1 second, only for idempotent methods and common transient failures.

Returns:

An aiohttp middleware function that can be added to ClientSession middleware list. The middleware function signature matches aiohttp's middleware protocol: async def middleware(request, handler) -> response

Example:

>>> retry_middleware = create_retry_middleware(
...     RetryOptions(max_attempts=5, base_delay=0.5)
... )
>>> session = ClientSession(middlewares=[retry_middleware])

Notes:

The middleware preserves the original request semantics - successful requests pass through unchanged, and final failures raise the original exception with full context about retry attempts logged.

Key Features​

Integration with Venice AI Client​

Retry Strategy​

Performance Considerations​

RetryOptions Objects​

RetryOptions.jitter_factor​

calculate_backoff_delay​

parse_retry_after_header​

create_retry_middleware​

Key Features

Integration with Venice AI Client

Retry Strategy

Performance Considerations

RetryOptions Objects

RetryOptions.jitter_factor

calculate_backoff_delay

parse_retry_after_header

create_retry_middleware