How We Deploy SvelteKit with Zero Downtime on a Single Linode
I'll write this post now based on the research brief and ground truth.
How We Deploy SvelteKit with Zero Downtime on a Single Linode
You don't need Kubernetes. I know that's a spicy opener for a deployment post, but hear me out.
kief.dev runs SvelteKit 2 with Svelte 5, Tailwind 4, and adapter-node. It serves 12 free developer tools, the Vekt supply chain scanner, and a blog you're reading right now. The whole thing runs on a single Linode VPS. Deploys take about 8 seconds. Users never see a blip.
Every few months someone asks why we don't use K8s, or at least ECS, or at minimum some managed container service. The answer is math. A Kubernetes control plane eats roughly 40% of compute resources just keeping itself alive. Average CPU utilization across K8s clusters sits around 10-13%. You're paying for an orchestrator that's mostly orchestrating itself.
For a SvelteKit app on a single server, that's absurd.
The Stack
Here's what actually runs in production:
- SvelteKit with
adapter-node(produces a standard Node.js server) - PM2 in cluster mode (process management + zero-downtime reloads)
- Caddy as reverse proxy (automatic HTTPS, live config API)
- systemd managing PM2 (auto-restart on crash or reboot)
- A deploy script that's 40 lines of bash
That's it. No Docker. No containers. No YAML files longer than your actual application code.
PM2 Cluster Mode
PM2's cluster mode forks your Node process across available CPU cores. When you run pm2 reload, it performs a rolling restart. New workers spin up, start accepting connections, then old workers drain and exit. No dropped requests.
Here's the ecosystem config:
// ecosystem.config.cjs
module.exports = {
apps: [{
name: 'kief-dev',
script: 'build/index.js',
instances: 'max',
exec_mode: 'cluster',
env: {
NODE_ENV: 'production',
PORT: 3000,
ORIGIN: 'https://kief.dev'
},
max_memory_restart: '512M',
kill_timeout: 5000,
wait_ready: true,
listen_timeout: 10000
}]
};
Two flags matter here. wait_ready tells PM2 not to route traffic to a new worker until it sends process.send('ready'). listen_timeout is the safety net -- if the process doesn't signal ready within 10 seconds, PM2 considers it up anyway. This prevents a bad deploy from hanging the reload forever.
SvelteKit with adapter-node doesn't send a ready signal by default. You can either add a tiny wrapper or rely on listen_timeout. We use the timeout. It's been fine for two years.
# Zero-downtime reload
pm2 reload kief-dev
# Check status
pm2 status
┌─────────┬────┬─────────┬──────┬───────┬────────┬─────────┐
│ name │ id │ mode │ ↺ │ status│ cpu │ memory │
├─────────┼────┼─────────┼──────┼───────┼────────┼─────────┤
│ kief-dev│ 0 │ cluster │ 12 │ online│ 0.1% │ 87.3mb │
│ kief-dev│ 1 │ cluster │ 12 │ online│ 0.1% │ 84.1mb │
│ kief-dev│ 2 │ cluster │ 12 │ online│ 0.1% │ 86.7mb │
│ kief-dev│ 3 │ cluster │ 12 │ online│ 0.1% │ 85.2mb │
└─────────┴────┴─────────┴──────┴───────┴────────┴─────────┘
Four workers, each under 90MB. The whole app uses about 350MB of RAM. A Kubernetes control plane would eat more than that before serving a single request.
Caddy as Reverse Proxy
Caddy handles TLS termination and proxies to PM2. The entire config:
kief.dev {
reverse_proxy localhost:3000 {
health_uri /api/health
health_interval 10s
health_timeout 3s
}
encode gzip zstd
header {
Strict-Transport-Security "max-age=31536000; includeSubDomains; preload"
X-Content-Type-Options "nosniff"
X-Frame-Options "DENY"
Referrer-Policy "strict-origin-when-cross-origin"
}
}
That's it. Caddy handles Let's Encrypt cert provisioning and renewal automatically. No certbot cron jobs. No nginx SSL blocks. No forgetting to renew and waking up to a broken site on a Saturday.
Caddy also has a live admin API on localhost:2019. You can atomically swap upstreams, modify routes, or replace the entire config with zero dropped connections. If a config change fails validation, it rolls back automatically. We don't use this for normal deploys (PM2 reload handles it), but it's there if you need blue-green switching at the proxy level.
The Deploy Script
Here's the actual deploy script, trimmed to the essentials:
#!/usr/bin/env bash
set -euo pipefail
APP_DIR="/opt/kief-dev"
REPO="git@git.kief.dev:KiefStudio/kief-dev.git"
BRANCH="main"
echo ":: pulling latest"
cd "$APP_DIR"
git fetch origin "$BRANCH"
git reset --hard "origin/$BRANCH"
echo ":: installing dependencies"
npm ci --production=false
echo ":: building"
npm run build
echo ":: reloading"
pm2 reload kief-dev --update-env
echo ":: verifying"
sleep 2
HEALTH=$(curl -sf http://localhost:3000/api/health || echo "FAIL")
if [ "$HEALTH" = "FAIL" ]; then
echo "!! health check failed, check logs"
pm2 logs kief-dev --lines 20
exit 1
fi
echo ":: done"
pm2 save
npm ci does a clean install from the lockfile. npm run build runs the SvelteKit build with adapter-node. pm2 reload does the rolling restart. The health check after confirms the new code is actually serving.
pm2 save persists the process list so systemd can restore it after a reboot.
systemd Keeps It Running
PM2 generates its own systemd unit file:
pm2 startup systemd
# Run the command it prints
pm2 save
Now PM2 starts on boot and restores your app. If the PM2 daemon itself crashes (hasn't happened in two years, but hey), systemd restarts it.
You can also use systemd's socket activation to keep the port open during process restarts. systemd binds the port and queues incoming connections while the app comes back up. Visitors get a brief delay instead of a connection refused. For our setup, PM2's rolling reload makes this unnecessary -- but it's worth knowing about if you're running a single-instance process without a cluster manager.
What About Rollbacks?
The deploy script uses git reset --hard to the latest commit on main. Rolling back is:
git log --oneline -5
# find the good commit
git reset --hard abc1234
npm ci --production=false
npm run build
pm2 reload kief-dev --update-env
That's a manual process. If you want automated rollback on health check failure, add it to the deploy script:
PREV_SHA=$(git rev-parse HEAD)
# ... deploy steps ...
HEALTH=$(curl -sf http://localhost:3000/api/health || echo "FAIL")
if [ "$HEALTH" = "FAIL" ]; then
echo "!! rolling back to $PREV_SHA"
git reset --hard "$PREV_SHA"
npm ci --production=false
npm run build
pm2 reload kief-dev --update-env
exit 1
fi
Twelve lines. That's your rollback strategy.
The Real Risk Isn't Deploy Gaps
Here's the thing nobody talks about in zero-downtime deployment posts. A 500ms gap during a PM2 reload is invisible to users. You know what isn't invisible? A dead disk. A corrupted filesystem. A Linode maintenance window that takes longer than expected.
The real risk on a single server is recovery time, not deploy time.
Our recovery plan:
- Nightly backups -- full server snapshot via Linode API, retained 7 days
- Infrastructure as code -- the server config is a bash script that installs everything from scratch. Caddy, PM2, Node, the app, cron jobs, firewall rules. New Linode to production in under 15 minutes.
- DNS on Cloudflare -- TTL at 5 minutes. If the server dies, point DNS to a static "back soon" page on Cloudflare Pages while we rebuild.
- Git is the source of truth -- the app, the deploy script, the PM2 config, and the Caddyfile all live in the repo. Nothing exists only on the server.
If our Linode vanishes tomorrow, we spin a new one, run the setup script, push the deploy, and update DNS. Fifteen minutes, maybe twenty if we're making coffee.
When This Stops Working
This setup stops being sufficient when you need:
- Multi-region redundancy -- a single server is a single point of failure in a single datacenter. If you need 99.99% uptime with geographic failover, you need more servers.
- Horizontal scaling under sustained load -- PM2 cluster mode maxes out at your CPU count. If you're consistently saturating all cores, you need another box.
- Multiple services with complex networking -- if your "app" is actually 6 microservices that need service discovery and mutual TLS, congratulations, you might actually need Kubernetes.
For a SvelteKit app serving developer tools and a product landing page? A single server with PM2 and Caddy is going to be boring and reliable for a long time. Boring is good. Boring means you're shipping features instead of debugging your infrastructure.
The Numbers
kief.dev runs SvelteKit 2 with adapter-node. Four PM2 workers use about 350MB total. Caddy adds maybe 30MB. The server has headroom for days.
The entire deployment infrastructure is: one Caddyfile (12 lines), one PM2 ecosystem config (18 lines), one deploy script (40 lines), and one systemd unit that PM2 generated automatically. Seventy lines of config, total.
Compare that to a Kubernetes deployment: the Dockerfile, the deployment manifest, the service manifest, the ingress, the HPA, the ConfigMap, the Secret, the namespace, the Helm chart or Kustomize overlay. You're at 200+ lines of YAML before you've served a single request.
Ship the simple thing. You can always complicate it later.
If you want to check what's in your project's dependency tree before you deploy it, Vekt scans 22 lockfile formats across 12 ecosystems for known vulnerabilities and malicious packages. Free tier, no signup, runs in your terminal. vekt scan . and you're done.