Skip to main content

Documentation Index

Fetch the complete documentation index at: https://fal.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

When you deploy your own applications on fal Serverless, you are billed for the total time your runners are alive, measured per-second by machine type.

Billing by runner state

Every runner transitions through the states below during its lifecycle. You are billed for the states marked Yes at the per-second rate for your machine type.
StateBilledDescription
PENDINGNoWaiting to be scheduled on available hardware
DOCKER_PULLNoPulling your container image from the registry
SETUPYesRunning your setup() method — loading models, initializing resources
IDLEYesRunner is ready but waiting for requests (includes keep_alive time)
RUNNINGYesActively processing one or more requests
DRAININGYesFinishing in-flight requests before shutdown
TERMINATINGYesRunning your teardown() method
TERMINATEDNoRunner has stopped and resources are released
5xx errors (HTTP 500+) are also not charged. See Runners for full details on each state and transitions.

GPU count multiplier

Multi-GPU instances are billed as gpu_count x duration. For example, a runner using 2x A100 GPUs for 60 seconds is billed as 120 GPU-seconds.

Monitoring your usage

Dashboard Billing

View your overall spend, invoices, and payment methods.

App Analytics

See per-app cost breakdown, request counts, and runner utilization.