Uvicorn is a lightning-fast ASGI server built on uvloop (a C-accelerated
asyncio event loop) and httptools. FastAPI requires an ASGI server because
it's built on Starlette, which uses the ASGI protocol.
pip install uvicorn[standard] # includes uvloop and httptools
uvicorn app.main:app --host 0.0.0.0 --port 8000
Key flags:
--reload— hot reload for development--workers N— multiple processes (single worker = one event loop)--host 0.0.0.0— listen on all interfaces--port 8000
Rule of thumb: use uvicorn[standard] in production for the uvloop speedup;
use plain uvicorn in Docker to keep the image small (uvloop has C deps).
Uvicorn alone handles one event loop per process. Gunicorn is a battle-tested process manager that handles worker lifecycle (respawning crashed workers, graceful restarts, signal handling). Together they give you:
- Gunicorn's robust process management.
- Uvicorn's async event loop per worker.
pip install gunicorn
gunicorn app.main:app \
-k uvicorn.workers.UvicornWorker \
--workers 4 \
--bind 0.0.0.0:8000 \
--timeout 120
UvicornWorker replaces Gunicorn's default sync worker with an async one.
Rule of thumb: use Gunicorn + UvicornWorker for traditional deployments on
VMs/bare metal; use Uvicorn directly in Kubernetes where the orchestrator handles
pod restarts.
The classic formula: workers = 2 × CPU_cores + 1.
# 4-core machine → 9 workers
gunicorn app.main:app -k uvicorn.workers.UvicornWorker --workers 9
However, FastAPI is async — a single worker handles many concurrent requests through the event loop. For I/O-bound apps (most web APIs), 2-4 workers per machine is often sufficient:
| App type | Worker count |
|---|---|
| Pure async I/O | 2–4 per machine |
| Mixed sync/async | 2 × cores |
| CPU-bound | 1 per core (use multiprocessing separately) |
Rule of thumb: start with 2 × cores + 1; profile under load and reduce if
workers share limited resources (DB connections, RAM).
- Concurrency: multiple tasks make progress by interleaving on a single CPU (one event loop thread handles thousands of waiting I/O operations).
- Parallelism: multiple tasks run simultaneously on multiple CPUs (multiple Uvicorn worker processes, each with their own event loop).
FastAPI gives you concurrency within a single worker via async def handlers.
You get parallelism by running multiple workers.
Worker 1 (CPU core 1): event loop handles 1000 concurrent requests
Worker 2 (CPU core 2): event loop handles 1000 concurrent requests
CPU-bound code (heavy computation) blocks a core — neither concurrency nor more
async def helps. Use a thread/process pool or a task queue.
Rule of thumb: async def handlers add concurrency (better I/O throughput per
worker); more workers add parallelism (better CPU utilisation).
Pass --reload flag:
uvicorn app.main:app --reload --reload-dir app
Or use the Python API (better for IDEs):
# run.py
import uvicorn
if __name__ == "__main__":
uvicorn.run("app.main:app", host="0.0.0.0", port=8000, reload=True)
--reload-dir app restricts watching to the app/ directory, avoiding
false reloads when .pyc files or test outputs change.
Rule of thumb: never use --reload in production — it adds overhead and
restarts the process on any file change, including logs and temp files.
FROM python:3.12-slim
WORKDIR /app
# Install dependencies first (layer cached until requirements change)
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy application code
COPY ./app ./app
EXPOSE 8000
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "4"]
Multi-stage build for smaller images:
FROM python:3.12-slim AS builder
RUN pip install --no-cache-dir -r requirements.txt --target /install
FROM python:3.12-slim
COPY --from=builder /install /usr/local/lib/python3.12/site-packages
COPY ./app ./app
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
Rule of thumb: use --no-cache-dir in pip installs to keep image size down;
copy requirements.txt before source code for layer caching.
from fastapi import FastAPI
from fastapi.responses import JSONResponse
app = FastAPI()
@app.get("/health", include_in_schema=False)
async def health():
return {"status": "ok"}
# Liveness (is the process alive?)
@app.get("/health/live", include_in_schema=False)
async def liveness():
return {"status": "alive"}
# Readiness (is the app ready to serve traffic?)
@app.get("/health/ready", include_in_schema=False)
async def readiness():
try:
await db.execute("SELECT 1")
except Exception:
return JSONResponse({"status": "not ready"}, status_code=503)
return {"status": "ready"}
Kubernetes livenessProbe uses /health/live; readinessProbe uses /health/ready.
Rule of thumb: readiness should check actual dependencies (DB, cache); liveness should only check the process is alive — failing liveness kills the pod.
When Uvicorn receives SIGTERM (sent by Kubernetes, Docker, or Gunicorn):
- It stops accepting new connections.
- Waits for in-flight requests to complete (up to
--timeout-graceful-shutdownseconds). - Calls the
lifespanshutdown code (afteryield). - Closes the event loop and exits.
uvicorn app.main:app --timeout-graceful-shutdown 30
In Kubernetes, set a preStop hook to delay pod termination so the load
balancer routes traffic away before the pod stops accepting connections:
lifecycle:
preStop:
exec:
command: ["sleep", "5"]
Rule of thumb: always set --timeout-graceful-shutdown to slightly less than
Kubernetes' terminationGracePeriodSeconds to give in-flight requests time to finish.
Pass them at container run time (don't bake secrets into the image):
docker run -e DATABASE_URL=postgresql://... -e SECRET_KEY=... myapp
Or use a .env file:
docker run --env-file .env.production myapp
In Docker Compose:
services:
api:
image: myapp
environment:
- DATABASE_URL=postgresql://db/mydb
- SECRET_KEY=${SECRET_KEY}
In Kubernetes, use Secrets (base64-encoded) for sensitive values:
envFrom:
- secretRef:
name: myapp-secrets
Rule of thumb: never embed production secrets in the Docker image or
docker-compose.yml — always inject at runtime from secrets management.
Mount sub-applications using Starlette's Mount:
from starlette.applications import Starlette
from starlette.routing import Mount
from fastapi import FastAPI
v1_app = FastAPI(title="API v1")
v2_app = FastAPI(title="API v2")
@v1_app.get("/items")
async def v1_items(): return [{"version": "v1"}]
@v2_app.get("/items")
async def v2_items(): return [{"version": "v2", "new_field": True}]
# Root app that routes between them
root_app = Starlette(routes=[
Mount("/v1", app=v1_app),
Mount("/v2", app=v2_app),
])
uvicorn main:root_app --port 8000
# /v1/items → v1_app; /v2/items → v2_app
# /v1/docs and /v2/docs each work independently
Rule of thumb: mount versioned sub-apps when versions differ so significantly that sharing middleware or the OpenAPI schema would be confusing.
Pass --ssl-keyfile and --ssl-certfile:
# Generate self-signed cert for dev
openssl req -x509 -newkey rsa:4096 -keyout key.pem -out cert.pem -days 365 -nodes
uvicorn app.main:app --ssl-keyfile key.pem --ssl-certfile cert.pem --port 8443
In production, don't terminate TLS in Uvicorn — use Nginx/Caddy/ALB in front.
TLS termination in the reverse proxy lets you use certbot for Let's Encrypt,
handle certificate rotation without restarting Uvicorn, and offload TLS overhead.
Rule of thumb: Uvicorn TLS is fine for internal service-to-service encryption or local dev HTTPS; for public-facing production use a reverse proxy for TLS.
More Deployment & Middleware interview questions
More ways to practice
The self-quiz is live. Get notified when mock interviews and new question packs drop.