Platform Headers

When you call a model or your own deployed app on fal, you can pass platform-level HTTP headers that control how the request is handled. These headers are separate from the model’s input arguments (like prompt or image_size) and from SDK method parameters (like start_timeout or client_timeout). They apply at the infrastructure level — controlling retries, payload storage, media expiration, and routing. Some of these headers have dedicated SDK parameters that set them automatically. For example, passing start_timeout=30 in the SDK sets X-Fal-Request-Timeout: 30 under the hood. Others, like X-Fal-Store-IO, can only be set via the headers dict. This page documents all platform headers in one place. For headers that have SDK parameters, the corresponding method pages are linked.

X-Fal-Request-Timeout (`start_timeout`)

Server-side time-to-start deadline in seconds. Despite the header name, this does not limit total request time. The clock starts when the request is submitted and covers queue wait, runner acquisition, and failed retry attempts. Once a runner successfully begins processing, the timeout stops and inference can run as long as it needs. If the deadline is reached before any runner starts processing, the server returns 504 Gateway Timeout with X-Fal-Request-Timeout-Type: user. To limit total client-side wait time (including processing), use client_timeout on subscribe() instead.


Header	`X-Fal-Request-Timeout`
Default	No timeout
Minimum	> 0.1 seconds
SDK parameter	`start_timeout` on `submit()`, `subscribe()`, and `run()`

X-Fal-Runner-Hint (`hint`)

Routing hint that tells fal to try to route the request to a specific runner. Useful for session affinity — for example, keeping requests pinned to a runner that already has a LoRA adapter or conversation state loaded in memory. If the hinted runner is unavailable, fal routes to any available runner.


Header	`X-Fal-Runner-Hint`
Default	Automatic routing
SDK parameter	`hint` on `submit()`, `subscribe()`, and `run()`

X-Fal-Queue-Priority (`priority`)

Queue priority for the request. Priority applies to the per-endpoint queue — every request to the same endpoint shares one queue, regardless of who sent it. A low-priority request sits behind all normal-priority requests. This means setting "low" on a shared model API deprioritizes your request relative to all other users of that model.


Header	`X-Fal-Queue-Priority`
Default	`"normal"`
Values	`"normal"`, `"low"`
SDK parameter	`priority` on `submit()` and `subscribe()`

X-Fal-Object-Lifecycle-Preference

Control how long generated files (images, videos, audio) are stored on fal’s CDN.


Header	`X-Fal-Object-Lifecycle-Preference`
Default	Your account setting (forever if not configured)
Format	JSON: `{"expiration_duration_seconds": <seconds>}`

import json

result = fal_client.subscribe(
    "fal-ai/nano-banana-2",
    arguments={"prompt": "a sunset"},
    headers={
        "X-Fal-Object-Lifecycle-Preference": json.dumps({
            "expiration_duration_seconds": 3600
        })
    }
)

Data Retention & Storage

Full guide to media expiration, payload retention, and the delete API

X-Fal-Store-IO

Prevent fal from storing request payloads (JSON inputs and outputs). Payloads are stored for 30 days by default and power the request history in your dashboard.


Header	`X-Fal-Store-IO`
Default	`"1"` (stored for 30 days)
Values	`"0"` to disable storage

result = fal_client.subscribe(
    "fal-ai/nano-banana-2",
    arguments={"prompt": "a sunset"},
    headers={"X-Fal-Store-IO": "0"}
)

This only prevents storage of the JSON payloads. CDN files generated during processing are still accessible (subject to media expiration settings).

X-Fal-No-Retry

Disable automatic retries for this request. By default, queue-based requests are retried for up to 10 total attempts on server errors (503, 504, connection errors).


Header	`X-Fal-No-Retry`
Default	Retries enabled
Values	`"1"`, `"true"`, `"yes"` to disable

result = fal_client.subscribe(
    "fal-ai/nano-banana-2",
    arguments={"prompt": "a sunset"},
    headers={"X-Fal-No-Retry": "1"}
)

Reliability & Retries

Learn more about automatic retries, fallbacks, and error handling

x-app-fal-disable-fallback

Disable automatic model fallbacks for this request. By default, fal may reroute requests to equivalent alternative endpoints if the primary is unavailable.


Header	`x-app-fal-disable-fallback`
Default	Fallbacks enabled

result = fal_client.subscribe(
    "fal-ai/nano-banana-2",
    arguments={"prompt": "a sunset"},
    headers={"x-app-fal-disable-fallback": "true"}
)

Reliability & Retries

Learn more about model fallbacks

fal_max_queue_length

Reject the request with 429 if the endpoint’s queue already has more than this many requests waiting (across all callers). Useful for latency-sensitive applications that prefer to fail fast rather than wait in a long queue.


Query param	`fal_max_queue_length`
Default	No limit
Type	Integer

This parameter is passed as a query parameter on the URL, not as a header. The SDKs do not currently expose it as a named parameter; use the raw URL approach or pass it via headers.

cURL

curl -X POST "https://queue.fal.run/fal-ai/nano-banana-2?fal_max_queue_length=10" \
  -H "Authorization: Key $FAL_KEY" \
  -H "Content-Type: application/json" \
  -d '{"prompt": "a sunset"}'

Response Headers

These headers are returned by fal in the response. They are informational; you don’t set them.

The total size of all response headers is limited to 16 KB. This includes both platform headers (listed below) and any custom headers set by the app. If the combined headers exceed 16 KB, the response will fail. This is most relevant when apps set large custom headers — for example, verbose routing hints via provide_hints().

Header	Description
`x-fal-request-id`	Unique identifier for the request. Use this when contacting support or correlating logs.
`X-Fal-Billable-Units`	Billing units charged for this request. See Pricing for how units map to cost.
`X-Fal-Served-From`	Internal identifier of the runner that served the request.
`X-Fal-Request-Timeout-Type`	Set to `user` when your `start_timeout` deadline triggered the 504. See Timeouts and Retries.
`X-Fal-Error-Type`	Error category on failure responses (e.g., `request_timeout`, `startup_timeout`, `runner_disconnected`). See Request Error Types.
`x-fal-runner-hints`	Routing hints returned by the runner for sticky session routing. See Optimize Routing Behavior.

Common Model Arguments

Common input parameters like seed, image_size, and safety checker that appear across many models

Setting Up

Model APIs

Serverless

Compute

Organizations

Platform Headers

X-Fal-Request-Timeout (`start_timeout`)

X-Fal-Runner-Hint (`hint`)

X-Fal-Queue-Priority (`priority`)

X-Fal-Object-Lifecycle-Preference

Data Retention & Storage

X-Fal-Store-IO

X-Fal-No-Retry

Reliability & Retries

x-app-fal-disable-fallback

Reliability & Retries

fal_max_queue_length

Response Headers

Common Model Arguments

Setting Up

Model APIs

Serverless

Compute

Organizations

Documentation Index

​X-Fal-Request-Timeout (start_timeout)

​X-Fal-Runner-Hint (hint)

​X-Fal-Queue-Priority (priority)

​X-Fal-Object-Lifecycle-Preference

Data Retention & Storage

​X-Fal-Store-IO

​X-Fal-No-Retry

Reliability & Retries

​x-app-fal-disable-fallback

Reliability & Retries

​fal_max_queue_length

​Response Headers

Common Model Arguments

X-Fal-Request-Timeout (`start_timeout`)

X-Fal-Runner-Hint (`hint`)

X-Fal-Queue-Priority (`priority`)

X-Fal-Object-Lifecycle-Preference

X-Fal-Store-IO

X-Fal-No-Retry

x-app-fal-disable-fallback

fal_max_queue_length

Response Headers