Feature: Infrastructure
Version: 1.0.0 Last Reviewed: 2026-02-10 Status: Approved
User Story
As a developer, I have rock-solid dev and prod environments with automated deployments, backups, and rollback capability so I can ship features confidently and recover from failures quickly.
Overview
Two completely isolated environments (dev + prod) running on Proxmox LXC containers. Each runs the Baby Basics API and PostgreSQL via Docker Compose. Deployments are automated via GitHub Actions. Data is protected by daily backups and a one-command rollback script.
Architecture
GitHub Actions (CI/CD)
│
├── push to develop ──→ SSH ──→ dev LXC container
│ ├── Docker: baby-basics-api
│ └── Docker: postgres
│
└── push to main ────→ SSH ──→ prod LXC container
├── Docker: baby-basics-api
├── Docker: postgres
└── Uptime Kuma (monitoring)
Nginx Proxy Manager (existing, separate host)
├── dev.baby.bretzfam.com ──→ dev LXC:3000
└── baby.bretzfam.com ──────→ prod LXC:3000
Component 1: GitHub Repository + Git Flow
Scope
- Create private GitHub repo
baby-basics - Push existing scaffolding as initial commit on
main - Create
developbranch frommain - Configure branch protection rules
- Set up repository secrets for CI/CD
Configuration
- Repo:
github.com/<owner>/baby-basics(private) - Default branch:
main - Branch protection on
main: require PR (no direct push), require CI to pass - Branch protection on
develop: require CI to pass - Feature branches:
bb-<forge-id>-<short-desc>fromdevelop
GitHub Actions Secrets (to be added later when LXC containers are ready)
DEV_SSH_KEY- Private SSH key for dev LXCDEV_HOST- Dev LXC IP/hostnamePROD_SSH_KEY- Private SSH key for prod LXCPROD_HOST- Prod LXC IP/hostnameDEPLOY_USER- SSH user on both LXC containers
Acceptance Criteria
- Private repo exists on GitHub
-
mainbranch has initial scaffolding commit -
developbranch exists and tracksmain - Branch protection rules are configured
-
.github/workflows/directory exists with placeholder workflow
Test Cases
- Verify branches:
git branch -rshowsorigin/mainandorigin/develop - Verify protection: Direct push to
mainis rejected (after protection is enabled) - Verify scaffold: Cloning the repo produces the full monorepo directory structure
Component 2: LXC Container Provisioning
Scope
- Create two LXC containers on Proxmox:
baby-basics-devandbaby-basics-prod - Install Docker and Docker Compose on each
- Configure networking (static IPs or DHCP reservations)
- Create deployment user with SSH key access
- Create application directory structure
Per-Container Setup
- OS: Debian 12 (bookworm) or Ubuntu 24.04 LTS
- Resources: 2 CPU cores, 2GB RAM, 20GB disk (adjustable)
- Packages: docker.io, docker-compose-plugin, git, curl, postgresql-client (for pg_dump)
- User:
deploywith sudo access, SSH key auth (no password) - App directory:
/opt/baby-basics/ - Backup directory:
/opt/backups/ - Environment file:
/opt/baby-basics/.env(gitignored, manually created)
.env File Template (per environment)
# /opt/baby-basics/.env
DATABASE_URL="postgresql://baby:STRONG_PASSWORD_HERE@db:5432/baby_basics?schema=public"
JWT_SECRET="RANDOM_64_CHAR_STRING_HERE"
NODE_ENV="production" # or "development" for dev
PORT=3000
Acceptance Criteria
- Dev LXC container is running on Proxmox
- Prod LXC container is running on Proxmox
- Docker is installed and running on both containers
-
deployuser can SSH to both containers with key auth -
/opt/baby-basics/and/opt/backups/directories exist on both -
.envfile exists on both with correct values - Containers can reach the internet (for Docker image pulls)
- Dev and prod are on separate IPs (completely isolated)
Test Cases
- SSH access:
ssh deploy@<dev-ip>andssh deploy@<prod-ip>succeed - Docker:
docker run hello-worldsucceeds on both - Isolation: Dev and prod have different IPs, different DB passwords
Component 3: Docker Compose Configs
Scope
- Shared base
docker-compose.ymlwith common configuration docker-compose.dev.ymloverride for dev-specific settingsdocker-compose.prod.ymloverride for prod-specific settings
Services
API service (baby-basics-api):
- Build from
apps/api/Dockerfile - Port: 3000 (mapped to host)
- Depends on:
db - Restart:
unless-stopped - Environment from
.envfile - Health check:
curl -f http://localhost:3000/api/v1/health
Database service (db):
- Image:
postgres:16-alpine - Volume:
pgdata(named volume for persistence) - Port: 5432 (NOT exposed to host in prod, exposed in dev for debugging)
- Environment:
POSTGRES_USER,POSTGRES_PASSWORD,POSTGRES_DBfrom.env - Health check:
pg_isready - Restart:
unless-stopped
Dev Overrides
- Postgres port exposed to host (5432:5432) for direct DB access
- API logs at debug level
- No resource limits
Prod Overrides
- Postgres port NOT exposed to host
- API logs at info level
- Container restart policy:
unless-stopped - Resource limits: API 1GB RAM, Postgres 1GB RAM
File Locations
infra/
├── docker-compose.yml # Shared base
├── docker-compose.dev.yml # Dev overrides
└── docker-compose.prod.yml # Prod overrides
Acceptance Criteria
-
docker compose -f docker-compose.yml -f docker-compose.dev.yml upstarts API + Postgres on dev -
docker compose -f docker-compose.yml -f docker-compose.prod.yml upstarts API + Postgres on prod - Postgres data persists across container restarts (named volume)
- API connects to Postgres and responds on port 3000
- Dev exposes Postgres port 5432, prod does not
Test Cases
- Start stack:
docker compose up -dsucceeds, both containers healthy - Persistence: Write data, restart containers, data survives
- Connectivity: API can query Postgres (
/api/v1/healthreturnsdb: "connected") - Isolation: Prod Postgres not reachable from host on 5432
Component 4: Health Check Endpoint
Scope
Minimal Fastify application that serves a single health check endpoint. This is the first actual code written for the API - just enough to verify the infrastructure works end-to-end.
API Contract
GET /api/v1/health
Response 200:
{
"status": "ok",
"timestamp": "2026-02-10T12:00:00.000Z",
"version": "0.1.0",
"db": "connected"
}
Response 503 (DB unreachable):
{
"status": "degraded",
"timestamp": "2026-02-10T12:00:00.000Z",
"version": "0.1.0",
"db": "error"
}
Implementation Notes
- Minimal Fastify app (just health route, no auth, no other routes)
- Tests DB connectivity via
prisma.$queryRaw(SELECT 1) - Reads version from
package.json - This endpoint is public (no auth required)
- Used by Docker health checks, Uptime Kuma, and deploy script verification
Acceptance Criteria
-
GET /api/v1/healthreturns 200 when DB is connected - Response includes
status,timestamp,version, anddbfields - Returns 503 with
db: "error"when database is unreachable - Response time <50ms
- Endpoint requires no authentication
Test Cases
- Healthy: Start with DB running, GET /health returns 200 with
db: "connected" - DB down: Start without DB, GET /health returns 503 with
db: "error" - Schema: Response matches documented JSON shape exactly
Component 5: Deploy Script
Scope
Single deployment script that handles the full deploy lifecycle with automatic rollback on failure.
Flow
deploy.sh [environment]
1. Validate environment argument (dev|prod)
2. cd /opt/baby-basics
3. Create pre-deploy backup: pg_dump → /opt/backups/pre-deploy-TIMESTAMP.sql
4. git pull origin <branch>
5. Run database migrations: docker compose exec api npx prisma migrate deploy
6. Rebuild and restart: docker compose build && docker compose up -d
7. Wait for health check (up to 30 seconds, polling every 2s)
8. If healthy: log success, clean up old pre-deploy backups (keep last 5)
9. If unhealthy: ROLLBACK
a. docker compose down
b. git checkout <previous-commit>
c. Restore pre-deploy backup: psql < /opt/backups/pre-deploy-TIMESTAMP.sql
d. docker compose build && docker compose up -d
e. Verify health
f. Log failure and rollback
File Location
infra/deploy.sh
Environment Detection
- Dev: pulls from
developbranch - Prod: pulls from
mainbranch - Determined by argument or by reading
NODE_ENVfrom.env
Acceptance Criteria
-
./deploy.sh devdeploys the develop branch to dev environment -
./deploy.sh proddeploys the main branch to prod environment - Pre-deploy pg_dump is created before any changes
- Database migrations run successfully
- Health check verifies the new deployment is working
- Failed health check triggers automatic rollback
- After rollback, the previous version is running and healthy
- Script exits with 0 on success, 1 on failure (after rollback)
- Script logs all steps to stdout for CI/CD visibility
Test Cases
- Happy path: Deploy with passing health check, verify new version running
- Rollback: Deploy with broken code, verify rollback restores previous version
- Migration: Deploy with schema changes, verify migrations applied
- Idempotent: Run deploy twice with no changes, no errors
Component 6: CI/CD Pipeline (GitHub Actions)
Scope
GitHub Actions workflow triggered by pushes to main and develop. Runs lint, test, then deploys to the appropriate environment via SSH.
Workflow: .github/workflows/api-ci.yml
Trigger
on:
push:
branches: [main, develop]
paths:
- 'apps/api/**'
- 'infra/**'
- '.github/workflows/api-ci.yml'
Jobs
-
lint-and-test (runs on
ubuntu-latest):- Checkout code
- Setup Node.js 22
- Install dependencies (
npm ciinapps/api/) - Start Postgres service container
- Run Prisma migrations against test DB
- Run linter (
npm run lint) - Run tests (
npm test)
-
deploy (needs: lint-and-test):
- Determine target:
main→ prod,develop→ dev - SSH into target server
- Run
deploy.shon server - Verify health check from CI runner
- Determine target:
Secrets Required
DEV_SSH_KEY,DEV_HOSTPROD_SSH_KEY,PROD_HOSTDEPLOY_USER
Acceptance Criteria
- Push to
developtriggers: lint → test → deploy to dev - Push to
maintriggers: lint → test → deploy to prod - Lint or test failure prevents deployment
- CI uses Postgres service container for tests
- SSH deploy uses key from GitHub secrets
- Workflow only triggers on
apps/api/**orinfra/**changes - Non-API changes (docs, iOS) do NOT trigger the workflow
Test Cases
- Develop push: Push to develop, verify dev server updated
- Main push: Push to main, verify prod server updated
- Test failure: Push with failing test, verify deploy is skipped
- Path filter: Push docs-only change, verify workflow doesn't run
Component 7: Nginx Proxy Hosts
Scope
Add two proxy hosts in the existing Nginx Proxy Manager instance.
Configuration
| Proxy Host | Upstream | SSL |
|---|---|---|
dev.baby.bretzfam.com | <dev-lxc-ip>:3000 | Let's Encrypt auto |
baby.bretzfam.com | <prod-lxc-ip>:3000 | Let's Encrypt auto |
Notes
- This is a GUI operation in Nginx Proxy Manager (no config files to write)
- DNS A records must point both subdomains to the NPM host
- SSL certificates auto-provisioned via Let's Encrypt
- Websocket support: OFF (not needed for MVP REST API)
- Custom locations: none
Acceptance Criteria
-
https://dev.baby.bretzfam.com/api/v1/healthreturns 200 -
https://baby.bretzfam.com/api/v1/healthreturns 200 - SSL certificates are valid (no browser warnings)
- HTTP redirects to HTTPS
Test Cases
- Dev routing:
curl https://dev.baby.bretzfam.com/api/v1/healthreturns health response - Prod routing:
curl https://baby.bretzfam.com/api/v1/healthreturns health response - SSL:
curl -I https://baby.bretzfam.comshows valid certificate - HTTP redirect:
curl -I http://baby.bretzfam.comreturns 301 to HTTPS
Component 8: Backup + Rollback Scripts
Backup Script (infra/backup.sh)
- Runs via cron daily at 3:00 AM
pg_dumpfrom the Docker Postgres container- Output:
/opt/backups/daily-YYYY-MM-DD.sql - Retention: 30 days (delete older backups)
- Logs to
/opt/backups/backup.log - Exit code 0 on success, 1 on failure
Cron Entry
0 3 * * * /opt/baby-basics/infra/backup.sh >> /opt/backups/backup.log 2>&1
Rollback Script (infra/rollback.sh)
- One-command rollback to previous deployment
- Reads the most recent pre-deploy backup from
/opt/backups/ - Stops containers, restores DB, checks out previous git commit, rebuilds, restarts
- Verifies health after rollback
Acceptance Criteria
-
backup.shcreates a valid SQL dump file - Dump file can be restored successfully:
psql < dump.sql - Backups older than 30 days are automatically deleted
- Cron job runs daily at 3 AM
-
rollback.shreverts to the previous deployment - After rollback, health check passes
- Both scripts log their actions
Test Cases
- Backup creates valid dump: Run backup, then restore into a fresh DB and verify data
- Retention: Create fake old backups, run backup, verify old ones deleted
- Rollback: Deploy new version, run rollback, verify previous version restored
- Backup with data: Insert test data, backup, verify dump contains it
Component 9: Uptime Kuma Monitoring
Scope
Set up Uptime Kuma to monitor the production health check endpoint.
Configuration
- Monitor type: HTTP(s)
- URL:
https://baby.bretzfam.com/api/v1/health - Check interval: 30 seconds
- Retry: 3 attempts before marking down
- Expected status: 200
- Alert: Notification on downtime (method TBD - could be email, Telegram, Pushover, etc.)
Notes
- Uptime Kuma runs as part of the prod Docker Compose stack OR as a separate service (user preference)
- Only monitors production (dev is not critical)
Acceptance Criteria
- Uptime Kuma is running and accessible
- Production health endpoint is monitored every 30 seconds
- Downtime triggers an alert notification
- Dashboard shows uptime history
Test Cases
- Normal operation: Health check passing, Uptime Kuma shows "Up"
- Downtime detection: Stop API container, verify Uptime Kuma detects downtime within 90 seconds
- Recovery detection: Restart API container, verify Uptime Kuma shows recovery
Component 10: End-to-End Infrastructure Verification
Scope
Verify the complete infrastructure works together. This is a manual checklist, not automated.
Verification Checklist
- Push a commit to
develop→ CI runs → dev server deploys →dev.baby.bretzfam.com/api/v1/healthreturns 200 - Push a commit to
main→ CI runs → prod server deploys →baby.bretzfam.com/api/v1/healthreturns 200 - Run
backup.shon prod → valid SQL dump created - Run
rollback.shon dev → previous version restored and healthy - Uptime Kuma shows prod as "Up"
- Stop prod API → Uptime Kuma alerts → restart API → Uptime Kuma shows recovery
- Dev and prod databases contain different data (isolation verified)
Boundaries
- This spec does NOT cover application features (auth, feedings, etc.)
- This spec does NOT cover iOS build/deploy (manual TestFlight for MVP)
- Proxmox host management (updates, firewalls) is out of scope
- DNS configuration is assumed to be handled separately
- The health check endpoint is the ONLY application code in this phase
Dependencies
- Proxmox host must be accessible
- DNS records for
baby.bretzfam.comanddev.baby.bretzfam.commust be configured - Nginx Proxy Manager must be running and accessible
- GitHub account with repo creation permissions