The most effective way to reduce cold starts is maintaining warm runners using scaling parameters. These control how many runners stay alive and how quickly new ones spin up.Documentation Index
Fetch the complete documentation index at: https://fal.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
keep_alive
Default: 60 seconds
Keep runners alive after their last request completes.
For apps without
min_concurrency or concurrency_buffer, a newly started runner that never picks up a request is shut down after 10 seconds instead of waiting for the full keep_alive period.min_concurrency
Default: 0
Maintain minimum runners alive at all times, regardless of traffic.
concurrency_buffer
Default: 0
Maintain extra runners beyond current demand.
Takes precedence over
min_concurrency when higher.concurrency_buffer_perc
Default: 0
Set buffer as a percentage of current request volume.
Actual buffer is the maximum of
concurrency_buffer and concurrency_buffer_perc / 100 * request volume.max_multiplexing
Default: 1
Number of concurrent requests each runner handles simultaneously.
scaling_delay
Default: 0 seconds
Wait time before scaling up when a request is queued.
startup_timeout
Default: Varies
Maximum time allowed for setup() to complete.
Persistence Across Deploys
Scaling parameters set via CLI or dashboard (keep_alive, min_concurrency, concurrency_buffer, etc.) persist across deployments by default. You don’t lose your tuning when you deploy a code change.
To reset all parameters back to code values, deploy with --reset-scale:
Deploy Behavior & Priority
Full explanation of how code, CLI, and dashboard settings interact
Cost Considerations
More warm runners = lower latency but higher cost. Balance based on your needs:- Latency-critical apps: Accept higher cost for warm runners (
min_concurrency,keep_alive) - Cost-sensitive apps: Optimize cold start duration instead (container images, caching)
- Variable traffic: Use buffers and scaling delays
Full Scaling Reference
Complete guide to scaling configuration including CLI and dashboard methods