When you call a model or your own deployed app on fal, you can pass platform-level HTTP headers that control how the request is handled. These headers are separate from the model’s input arguments (likeDocumentation Index
Fetch the complete documentation index at: https://fal.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
prompt or image_size) and from SDK method parameters (like start_timeout or client_timeout). They apply at the infrastructure level — controlling retries, payload storage, media expiration, and routing.
Some of these headers have dedicated SDK parameters that set them automatically. For example, passing start_timeout=30 in the SDK sets X-Fal-Request-Timeout: 30 under the hood. Others, like X-Fal-Store-IO, can only be set via the headers dict. This page documents all platform headers in one place. For headers that have SDK parameters, the corresponding method pages are linked.
X-Fal-Request-Timeout (start_timeout)
Server-side time-to-start deadline in seconds. Despite the header name, this does not limit total request time. The clock starts when the request is submitted and covers queue wait, runner acquisition, and failed retry attempts. Once a runner successfully begins processing, the timeout stops and inference can run as long as it needs. If the deadline is reached before any runner starts processing, the server returns 504 Gateway Timeout with X-Fal-Request-Timeout-Type: user. To limit total client-side wait time (including processing), use client_timeout on subscribe() instead.
| Header | X-Fal-Request-Timeout |
| Default | No timeout |
| Minimum | > 0.1 seconds |
| SDK parameter | start_timeout on submit(), subscribe(), and run() |
X-Fal-Runner-Hint (hint)
Routing hint that tells fal to try to route the request to a specific runner. Useful for session affinity — for example, keeping requests pinned to a runner that already has a LoRA adapter or conversation state loaded in memory. If the hinted runner is unavailable, fal routes to any available runner.
| Header | X-Fal-Runner-Hint |
| Default | Automatic routing |
| SDK parameter | hint on submit(), subscribe(), and run() |
X-Fal-Queue-Priority (priority)
Queue priority for the request. Priority applies to the per-endpoint queue — every request to the same endpoint shares one queue, regardless of who sent it. A low-priority request sits behind all normal-priority requests. This means setting "low" on a shared model API deprioritizes your request relative to all other users of that model.
| Header | X-Fal-Queue-Priority |
| Default | "normal" |
| Values | "normal", "low" |
| SDK parameter | priority on submit() and subscribe() |
X-Fal-Object-Lifecycle-Preference
Control how long generated files (images, videos, audio) are stored on fal’s CDN.| Header | X-Fal-Object-Lifecycle-Preference |
| Default | Your account setting (forever if not configured) |
| Format | JSON: {"expiration_duration_seconds": <seconds>} |
Data Retention & Storage
Full guide to media expiration, payload retention, and the delete API
X-Fal-Store-IO
Prevent fal from storing request payloads (JSON inputs and outputs). Payloads are stored for 30 days by default and power the request history in your dashboard.| Header | X-Fal-Store-IO |
| Default | "1" (stored for 30 days) |
| Values | "0" to disable storage |
This only prevents storage of the JSON payloads. CDN files generated during processing are still accessible (subject to media expiration settings).
X-Fal-No-Retry
Disable automatic retries for this request. By default, queue-based requests are retried for up to 10 total attempts on server errors (503, 504, connection errors).| Header | X-Fal-No-Retry |
| Default | Retries enabled |
| Values | "1", "true", "yes" to disable |
Reliability & Retries
Learn more about automatic retries, fallbacks, and error handling
x-app-fal-disable-fallback
Disable automatic model fallbacks for this request. By default, fal may reroute requests to equivalent alternative endpoints if the primary is unavailable.| Header | x-app-fal-disable-fallback |
| Default | Fallbacks enabled |
Reliability & Retries
Learn more about model fallbacks
fal_max_queue_length
Reject the request with429 if the endpoint’s queue already has more than this many requests waiting (across all callers). Useful for latency-sensitive applications that prefer to fail fast rather than wait in a long queue.
| Query param | fal_max_queue_length |
| Default | No limit |
| Type | Integer |
This parameter is passed as a query parameter on the URL, not as a header. The SDKs do not currently expose it as a named parameter; use the raw URL approach or pass it via
headers.cURL
Response Headers
These headers are returned by fal in the response. They are informational; you don’t set them.| Header | Description |
|---|---|
x-fal-request-id | Unique identifier for the request. Use this when contacting support or correlating logs. |
X-Fal-Billable-Units | Billing units charged for this request. See Pricing for how units map to cost. |
X-Fal-Served-From | Internal identifier of the runner that served the request. |
X-Fal-Request-Timeout-Type | Set to user when your start_timeout deadline triggered the 504. See Timeouts and Retries. |
X-Fal-Error-Type | Error category on failure responses (e.g., request_timeout, startup_timeout, runner_disconnected). See Request Error Types. |
x-fal-runner-hints | Routing hints returned by the runner for sticky session routing. See Optimize Routing Behavior. |
Common Model Arguments
Common input parameters like seed, image_size, and safety checker that appear across many models