fal.App proxy for full control over the API surface. Both approaches give you autoscaling, analytics, and the same infrastructure that powers every model in the marketplace.
This is the fastest path for teams migrating from self-hosted infrastructure, Kubernetes, or other serverless platforms. Your existing server code stays unchanged. You just need to define a Dockerfile (or reference an existing image from a private registry) and tell fal how to start your server. If you are starting from scratch rather than migrating, the Quick Start is a better starting point.
Dockerfile vs fal.App
Most of the Serverless documentation focuses onfal.App, the class-based approach where you define setup(), endpoints, and teardown() as methods on a class. For server migration, this guide starts with a Dockerfile instead. Your Dockerfile starts the server process, and pyproject.toml provides the deployment configuration such as machine type, scaling parameters, container image, and exposed port.
Direct Server Mode is the natural fit for existing servers because you typically just need to start a process and expose a port. You do not need lifecycle hooks or multiple endpoints since your server already handles those. Both Direct Server Mode and fal.App support the same scaling parameters (keep_alive, min_concurrency, max_concurrency, and more). See the pyproject.toml reference for the full configuration schema.
Option 1: Direct Server Mode
Useexposed_port to route requests directly to your container’s port. fal forwards all incoming traffic to that port without any intermediate processing. The port can be any valid port number, just ensure it matches the port your server listens on.
Create a Dockerfile that installs and starts your server. The server must bind to 0.0.0.0 on the same port you expose in pyproject.toml.
pyproject.toml:
image configuration. See Private Docker Registries for Docker Hub, Google Artifact Registry, and Amazon ECR examples.
Deploy by app name:
[tool.fal.apps.my-server.image]:
pyproject.toml schema, see the pyproject.toml reference.
Option 2: Proxy App Mode
Usefal.App to wrap your server with custom endpoints. This gives you control over the API surface: you can validate inputs with Pydantic, transform outputs, upload files to the fal CDN, and define a typed schema that powers the Playground UI.
fal.App controls the API. The internal server runs on localhost inside the same container, and your proxy endpoints handle input validation, output processing, and CDN uploads. This approach is ideal when you want a clean typed API over an existing server that has its own internal protocol.
Using an External Registry
If your Dockerfile pulls from an external registry (Docker Hub, Google Artifact Registry, Amazon ECR), or your app references an existing private image, provide registry credentials with your image configuration. This works for both Direct Server Mode inpyproject.toml and fal.App custom containers. See Private Docker Registries for setup instructions including authentication for each registry type.
Best Practices
Download model weights to persistent storage (/data) during runner startup rather than baking them into the Docker image. For fal.App, this usually means setup(). For Direct Server Mode migrations, use your server’s own startup path. This keeps your image small, speeds up container pulls, and allows weights to be cached across runner restarts. The /data directory is shared across all runners in your account and persists between deploys.
When your container runs fal.App code, install fal-specific packages (boto3, protobuf, pydantic) at the end to avoid version conflicts with your existing dependencies. Containers that do not import fal do not need these packages.
Tune keep_alive based on your app’s cold start time and traffic pattern. If your model takes minutes to load, a longer keep_alive avoids paying that cost repeatedly. If your app starts quickly, a shorter value reduces idle billing. See Optimizing Costs for guidance.
Next Steps
For a complete tutorial that applies this pattern to a real server, see the ComfyUI deployment example. For detailed Dockerfile configuration including build args, multi-stage builds, and private registries, see Custom Container Images. To understand how the/data persistent storage works and what gets cached, see Use Persistent Storage.