venice_ai.middleware.retry
Advanced retry middleware for aiohttp with intelligent exponential backoff and jitter.
This module provides a sophisticated retry mechanism specifically designed for the Venice AI client's HTTP communication layer. It integrates seamlessly with aiohttp's middleware system to handle transient failures, rate limiting, and network issues with intelligent retry strategies.
Key Features
- Exponential Backoff: Implements exponential backoff with configurable base delay and multiplier
- Jitter Support: Adds randomization to prevent thundering herd problems
- Smart Retry Logic: Differentiates between idempotent and non-idempotent HTTP methods
- Rate Limit Awareness: Respects Retry-After headers from API responses
- Configurable Exception Handling: Customizable set of exceptions that trigger retries
- Comprehensive Logging: Detailed logging for monitoring and debugging retry behavior
Integration with Venice AI Client
The retry middleware is automatically integrated into the Venice AI client's HTTP session during initialization. It operates transparently at the transport layer, intercepting failed requests and applying retry logic before propagating failures to higher-level code.
Retry Strategy
The module implements a sophisticated retry strategy that considers:
- HTTP Method Safety: Only retries idempotent methods (GET, PUT, DELETE, etc.) by default
- Status Code Analysis: Retries on specific HTTP status codes (5xx errors by default)
- Exception Type Filtering: Retries on network timeouts and connection errors
- Exponential Backoff: Increases delay between attempts to reduce server load
- Jitter Application: Adds randomness to prevent synchronized retry storms
Performance Considerations
The retry mechanism is designed to be efficient and respectful of server resources:
- Caps maximum delay to prevent excessive wait times
- Uses jitter to distribute retry attempts across time
- Respects server-provided Retry-After headers
- Logs retry attempts for monitoring and debugging
RetryOptions Objects
@dataclass
class RetryOptions()
Comprehensive configuration for retry behavior and exponential backoff strategies.
This class provides fine-grained control over how the retry middleware handles failed requests, including timing strategies, condition filtering, and monitoring hooks.
Attributes:
-
max_attempts- Maximum number of retry attempts (excluding the initial request). For example, max_attempts=3 means up to 4 total attempts (1 initial + 3 retries). Higher values increase resilience but may delay error propagation. -
retry_status_codes- Set of HTTP status codes that should trigger a retry attempt. Default includes server errors (5xx). Rate limiting (429) is intentionally excluded because SimpleRateLimiter handles 429 retries with per-model state tracking, exponential backoff, and Retry-After header support. Common additions might include 408 (Request Timeout) or 413 (Payload Too Large) depending on use case. -
retry_exceptions- List of exception types that should trigger retry attempts. Focuses on transient network issues and timeouts that are likely to resolve on subsequent attempts. Does not include programming errors or authentication failures. -
base_delay- Base delay in seconds for exponential backoff calculation. This is the starting delay for the first retry attempt. Subsequent attempts use exponential_base^attempt * base_delay. Lower values provide faster retries but may overwhelm struggling servers. -
max_delay- Maximum delay in seconds to cap exponential growth. Prevents exponential backoff from creating excessively long delays. Helps maintain reasonable response times even after multiple failures. -
exponential_base- Base multiplier for exponential backoff calculation. Determines how quickly delays increase. 2.0 doubles delay each attempt, while 1.5 provides more gradual increases. Higher values back off more aggressively. -
jitter_factor- Randomization factor (0.0 to 1.0) to prevent thundering herd problems. Adds random variation to delays to prevent multiple clients from retrying simultaneously. 0.1 means ±10% random variation. Higher values increase randomization but may make retry timing less predictable. -
respect_retry_after- Whether to honor Retry-After headers from server responses. When True, server-provided retry delays override calculated exponential backoff. Recommended for APIs that provide intelligent rate limiting guidance. -
max_retry_after- Maximum seconds to wait for server-provided Retry-After values. Prevents malicious or misconfigured servers from forcing excessive delays. Acts as a safety cap on server-directed retry timing. -
idempotent_methods- Set of HTTP methods considered safe to retry automatically. These methods should not have side effects when repeated. POST is notably excluded by default since it typically creates or modifies resources. -
retry_non_idempotent- Whether to retry non-idempotent methods like POST. Default is True because Venice API endpoints (chat completions, embeddings, image generation) are effectively idempotent and safe to retry on transient errors. Set to False if your use case involves non-idempotent operations. -
on_retry- Optional callback function for monitoring or logging retry attempts. Called with (attempt_number, delay_seconds, exception_or_none) for each retry. Useful for metrics collection, alerting, or debugging retry behavior.
RetryOptions.jitter_factor
jitter_factor = 0.1
Using 0.1 for backward compatibility with existing configs
calculate_backoff_delay
def calculate_backoff_delay(attempt: int, base_delay: float,
exponential_base: float, max_delay: float,
jitter_factor: float) -> float
Calculate intelligent retry delay using exponential backoff with jitter.
This function implements a sophisticated backoff strategy that balances quick recovery from transient issues with respectful behavior toward struggling servers. The algorithm combines exponential backoff (to reduce load on failing services) with jitter (to prevent thundering herd problems when multiple clients retry simultaneously).
The calculation process:
- Compute exponential delay: base_delay * (exponential_base ^ attempt)
- Cap the result at max_delay to prevent excessive waits
- Apply jitter as random variation: ±(jitter_factor * delay)
- Ensure the final delay is never negative
Arguments:
attempt- The current attempt number (0-based indexing). attempt=0 for first retry, attempt=1 for second retry, etc.base_delay- Base delay in seconds for the exponential calculation. This is the delay used for the first retry attempt before exponential growth.exponential_base- Multiplicative base for exponential backoff. Common values are 2.0 (doubling) or 1.5 (50% increase per attempt).max_delay- Maximum delay in seconds to cap exponential growth. Prevents extremely long delays that could impact user experience.jitter_factor- Randomization factor between 0.0 and 1.0. 0.0 = no randomness, 1.0 = up to 100% variation in either direction.
Returns:
Calculated delay in seconds before the next retry attempt. Always returns a non-negative value, even with maximum jitter applied.
Example:
>>> calculate_backoff_delay(0, 1.0, 2.0, 60.0, 0.1)
# First retry: ~1.0 seconds ± 10% jitter
>>> calculate_backoff_delay(2, 1.0, 2.0, 60.0, 0.1)
# Third retry: ~4.0 seconds ± 10% jitter
parse_retry_after_header
def parse_retry_after_header(response: ClientResponse) -> float | None
Parse the Retry-After header from an HTTP response to determine server-suggested delay.
The Retry-After header is commonly used by APIs to indicate when a client should retry a request, particularly for rate limiting (429) and temporary service unavailability (503) responses. This function handles both formats specified in RFC 7231.
The header can contain either:
- An integer number of seconds to wait (e.g., "Retry-After: 120")
- An HTTP-date timestamp indicating when to retry (e.g., "Retry-After: Wed, 21 Oct 2015 07:28:00 GMT")
Arguments:
response- The aiohttp ClientResponse object containing the HTTP headers. Must be a valid response object with accessible headers.
Returns:
Number of seconds to wait before retrying, or None if:
- The Retry-After header is not present in the response
- The header value cannot be parsed as a valid delay
- The header contains an unrecognized date format
Notes:
HTTP-date parsing uses email.utils.parsedate_to_datetime() which follows RFC 2822 and RFC 5322 date formats commonly used in HTTP headers.
create_retry_middleware
def create_retry_middleware(options: RetryOptions | None = None) -> Middleware
Create an intelligent aiohttp middleware that implements advanced retry logic.
This function returns a middleware component that integrates into aiohttp's request pipeline to automatically handle transient failures with sophisticated retry strategies. The middleware operates transparently, intercepting failed requests and applying configurable retry logic before either succeeding or propagating the final failure.
The middleware implements several layers of intelligence:
Request Analysis: Determines whether a request is safe to retry based on:
- HTTP method idempotency (GET, PUT, DELETE are safe; POST typically isn't)
- Configuration settings for non-idempotent method handling
Failure Detection: Identifies retryable failures through:
- HTTP status code analysis (rate limiting, server errors)
- Exception type filtering (timeouts, network errors)
- Exclusion of permanent failures (authentication, client errors)
Retry Strategy: Applies intelligent backoff using:
- Exponential backoff to reduce load on struggling servers
- Jitter to prevent thundering herd problems
- Server-provided Retry-After header respect
- Configurable maximum delays and attempt limits
Monitoring Integration: Provides visibility through:
- Detailed logging of retry attempts and decisions
- Optional callback hooks for metrics collection
- Exception preservation for proper error propagation
Arguments:
options- Optional RetryOptions instance for customizing retry behavior. If None, uses default retry configuration suitable for most API interactions. Default behavior retries up to 3 times with exponential backoff starting at 1 second, only for idempotent methods and common transient failures.
Returns:
An aiohttp middleware function that can be added to ClientSession middleware list. The middleware function signature matches aiohttp's middleware protocol: async def middleware(request, handler) -> response
Example:
>>> retry_middleware = create_retry_middleware(
... RetryOptions(max_attempts=5, base_delay=0.5)
... )
>>> session = ClientSession(middlewares=[retry_middleware])
Notes:
The middleware preserves the original request semantics - successful requests pass through unchanged, and final failures raise the original exception with full context about retry attempts logged.