Before deploying to production, you want to verify that your endpoints produce correct outputs, handle edge cases gracefully, and perform within acceptable latency. TheDocumentation Index
Fetch the complete documentation index at: https://fal.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
AppClient in the fal SDK gives you a way to do this programmatically — it deploys your app to fal’s serverless infrastructure in ephemeral mode, runs your tests against the live endpoints (including GPU execution, setup(), and the full request pipeline), and cleans up the deployment when testing is complete.
This means your tests run against the real environment your app will use in production, not a mocked local version. If your model loads correctly in setup(), processes inputs through your endpoint, and returns valid outputs, you can be confident the deployed version will behave the same way. You can integrate these tests into your CI pipeline to catch regressions before they reach production.
Testing with AppClient
AppClient connects to your app class and exposes its endpoints as callable methods. It handles deployment, connection, and teardown automatically via a context manager.
Running locally with fal run --local
When you want a tighter feedback loop than an ephemeral deployment can give you, fal run --local executes your function or app on your own machine instead of provisioning a remote runner. For apps, this starts the FastAPI server on localhost so you can hit your endpoints directly:
setup() runs on your machine, and there is no GPU unless your machine has one. When you need to verify behavior on the real serverless environment (containers, GPUs, scaling, cold starts), use AppClient as shown above instead.