Rate Limiting and Throttling#
The client batches requests and runs them concurrently. By default it uses the API’s recommended throttle and a concurrency limit (up to 16 in-flight requests). You can leave defaults, set a custom rate (e.g. requests per second or minute), or set a custom concurrency limit.
Examples:
from biolmai.core.http import BioLMApi, BioLMApiClient
# Use API's default throttle rate (recommended)
model = BioLMApi("esmfold")
# Custom rate limit: 1000 requests per second
model = BioLMApi("esmfold", rate_limit="1000/second")
# Custom rate limit: 60 requests per minute
model = BioLMApi("esmfold", rate_limit="60/minute")
# Custom concurrency limit: at most 5 requests at once
model = BioLMApiClient("esmfold", semaphore=5)
# Or use your own semaphore
import asyncio
sem = asyncio.Semaphore(5)
model = BioLMApiClient("esmfold", semaphore=sem)
# Disable default semaphore (no concurrency limit)
model = BioLMApiClient("esmfold", semaphore=None)
# Both: at most 5 concurrent, and at most 1000 per second
model = BioLMApiClient("esmfold", semaphore=sem, rate_limit="1000/second")
Implementation Details#
Semaphore: Limits the number of requests in flight at any moment.
Rate Limiter: Tracks timestamps of recent requests and enforces the N-per-window rule.
Sliding Window: The limiter removes timestamps older than the window (1s or 60s) and only allows a new request if the count is below N.
Acquisition Order: Semaphore is acquired first, then the rate limiter.
Best Practices#
For most users, the default API throttle is sufficient and safest.
Use a custom rate limit if you have a dedicated quota or want to avoid 429 errors.
Use a semaphore to avoid overwhelming your own network or compute resources.
For very high throughput, combine both.