When you deploy your own applications on fal Serverless, you are billed for the total time your runners are alive, measured per-second by machine type.Documentation Index
Fetch the complete documentation index at: https://fal.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
Billing by runner state
Every runner transitions through the states below during its lifecycle. You are billed for the states marked Yes at the per-second rate for your machine type.| State | Billed | Description |
|---|---|---|
| PENDING | No | Waiting to be scheduled on available hardware |
| DOCKER_PULL | No | Pulling your container image from the registry |
| SETUP | Yes | Running your setup() method — loading models, initializing resources |
| IDLE | Yes | Runner is ready but waiting for requests (includes keep_alive time) |
| RUNNING | Yes | Actively processing one or more requests |
| DRAINING | Yes | Finishing in-flight requests before shutdown |
| TERMINATING | Yes | Running your teardown() method |
| TERMINATED | No | Runner has stopped and resources are released |
GPU count multiplier
Multi-GPU instances are billed asgpu_count x duration. For example, a runner using 2x A100 GPUs for 60 seconds is billed as 120 GPU-seconds.
Monitoring your usage
Dashboard Billing
View your overall spend, invoices, and payment methods.
App Analytics
See per-app cost breakdown, request counts, and runner utilization.