Skip to main content

Feature: Infrastructure

Version: 1.0.0 Last Reviewed: 2026-02-10 Status: Approved

User Story

As a developer, I have rock-solid dev and prod environments with automated deployments, backups, and rollback capability so I can ship features confidently and recover from failures quickly.

Overview

Two completely isolated environments (dev + prod) running on Proxmox LXC containers. Each runs the Baby Basics API and PostgreSQL via Docker Compose. Deployments are automated via GitHub Actions. Data is protected by daily backups and a one-command rollback script.

Architecture

GitHub Actions (CI/CD)

├── push to develop ──→ SSH ──→ dev LXC container
│ ├── Docker: baby-basics-api
│ └── Docker: postgres

└── push to main ────→ SSH ──→ prod LXC container
├── Docker: baby-basics-api
├── Docker: postgres
└── Uptime Kuma (monitoring)

Nginx Proxy Manager (existing, separate host)
├── dev.baby.bretzfam.com ──→ dev LXC:3000
└── baby.bretzfam.com ──────→ prod LXC:3000

Component 1: GitHub Repository + Git Flow

Scope

  • Create private GitHub repo baby-basics
  • Push existing scaffolding as initial commit on main
  • Create develop branch from main
  • Configure branch protection rules
  • Set up repository secrets for CI/CD

Configuration

  • Repo: github.com/<owner>/baby-basics (private)
  • Default branch: main
  • Branch protection on main: require PR (no direct push), require CI to pass
  • Branch protection on develop: require CI to pass
  • Feature branches: bb-<forge-id>-<short-desc> from develop

GitHub Actions Secrets (to be added later when LXC containers are ready)

  • DEV_SSH_KEY - Private SSH key for dev LXC
  • DEV_HOST - Dev LXC IP/hostname
  • PROD_SSH_KEY - Private SSH key for prod LXC
  • PROD_HOST - Prod LXC IP/hostname
  • DEPLOY_USER - SSH user on both LXC containers

Acceptance Criteria

  • Private repo exists on GitHub
  • main branch has initial scaffolding commit
  • develop branch exists and tracks main
  • Branch protection rules are configured
  • .github/workflows/ directory exists with placeholder workflow

Test Cases

  1. Verify branches: git branch -r shows origin/main and origin/develop
  2. Verify protection: Direct push to main is rejected (after protection is enabled)
  3. Verify scaffold: Cloning the repo produces the full monorepo directory structure

Component 2: LXC Container Provisioning

Scope

  • Create two LXC containers on Proxmox: baby-basics-dev and baby-basics-prod
  • Install Docker and Docker Compose on each
  • Configure networking (static IPs or DHCP reservations)
  • Create deployment user with SSH key access
  • Create application directory structure

Per-Container Setup

  • OS: Debian 12 (bookworm) or Ubuntu 24.04 LTS
  • Resources: 2 CPU cores, 2GB RAM, 20GB disk (adjustable)
  • Packages: docker.io, docker-compose-plugin, git, curl, postgresql-client (for pg_dump)
  • User: deploy with sudo access, SSH key auth (no password)
  • App directory: /opt/baby-basics/
  • Backup directory: /opt/backups/
  • Environment file: /opt/baby-basics/.env (gitignored, manually created)

.env File Template (per environment)

# /opt/baby-basics/.env
DATABASE_URL="postgresql://baby:STRONG_PASSWORD_HERE@db:5432/baby_basics?schema=public"
JWT_SECRET="RANDOM_64_CHAR_STRING_HERE"
NODE_ENV="production" # or "development" for dev
PORT=3000

Acceptance Criteria

  • Dev LXC container is running on Proxmox
  • Prod LXC container is running on Proxmox
  • Docker is installed and running on both containers
  • deploy user can SSH to both containers with key auth
  • /opt/baby-basics/ and /opt/backups/ directories exist on both
  • .env file exists on both with correct values
  • Containers can reach the internet (for Docker image pulls)
  • Dev and prod are on separate IPs (completely isolated)

Test Cases

  1. SSH access: ssh deploy@<dev-ip> and ssh deploy@<prod-ip> succeed
  2. Docker: docker run hello-world succeeds on both
  3. Isolation: Dev and prod have different IPs, different DB passwords

Component 3: Docker Compose Configs

Scope

  • Shared base docker-compose.yml with common configuration
  • docker-compose.dev.yml override for dev-specific settings
  • docker-compose.prod.yml override for prod-specific settings

Services

API service (baby-basics-api):

  • Build from apps/api/Dockerfile
  • Port: 3000 (mapped to host)
  • Depends on: db
  • Restart: unless-stopped
  • Environment from .env file
  • Health check: curl -f http://localhost:3000/api/v1/health

Database service (db):

  • Image: postgres:16-alpine
  • Volume: pgdata (named volume for persistence)
  • Port: 5432 (NOT exposed to host in prod, exposed in dev for debugging)
  • Environment: POSTGRES_USER, POSTGRES_PASSWORD, POSTGRES_DB from .env
  • Health check: pg_isready
  • Restart: unless-stopped

Dev Overrides

  • Postgres port exposed to host (5432:5432) for direct DB access
  • API logs at debug level
  • No resource limits

Prod Overrides

  • Postgres port NOT exposed to host
  • API logs at info level
  • Container restart policy: unless-stopped
  • Resource limits: API 1GB RAM, Postgres 1GB RAM

File Locations

infra/
├── docker-compose.yml # Shared base
├── docker-compose.dev.yml # Dev overrides
└── docker-compose.prod.yml # Prod overrides

Acceptance Criteria

  • docker compose -f docker-compose.yml -f docker-compose.dev.yml up starts API + Postgres on dev
  • docker compose -f docker-compose.yml -f docker-compose.prod.yml up starts API + Postgres on prod
  • Postgres data persists across container restarts (named volume)
  • API connects to Postgres and responds on port 3000
  • Dev exposes Postgres port 5432, prod does not

Test Cases

  1. Start stack: docker compose up -d succeeds, both containers healthy
  2. Persistence: Write data, restart containers, data survives
  3. Connectivity: API can query Postgres (/api/v1/health returns db: "connected")
  4. Isolation: Prod Postgres not reachable from host on 5432

Component 4: Health Check Endpoint

Scope

Minimal Fastify application that serves a single health check endpoint. This is the first actual code written for the API - just enough to verify the infrastructure works end-to-end.

API Contract

GET /api/v1/health

Response 200:
{
"status": "ok",
"timestamp": "2026-02-10T12:00:00.000Z",
"version": "0.1.0",
"db": "connected"
}

Response 503 (DB unreachable):
{
"status": "degraded",
"timestamp": "2026-02-10T12:00:00.000Z",
"version": "0.1.0",
"db": "error"
}

Implementation Notes

  • Minimal Fastify app (just health route, no auth, no other routes)
  • Tests DB connectivity via prisma.$queryRaw(SELECT 1)
  • Reads version from package.json
  • This endpoint is public (no auth required)
  • Used by Docker health checks, Uptime Kuma, and deploy script verification

Acceptance Criteria

  • GET /api/v1/health returns 200 when DB is connected
  • Response includes status, timestamp, version, and db fields
  • Returns 503 with db: "error" when database is unreachable
  • Response time <50ms
  • Endpoint requires no authentication

Test Cases

  1. Healthy: Start with DB running, GET /health returns 200 with db: "connected"
  2. DB down: Start without DB, GET /health returns 503 with db: "error"
  3. Schema: Response matches documented JSON shape exactly

Component 5: Deploy Script

Scope

Single deployment script that handles the full deploy lifecycle with automatic rollback on failure.

Flow

deploy.sh [environment]
1. Validate environment argument (dev|prod)
2. cd /opt/baby-basics
3. Create pre-deploy backup: pg_dump → /opt/backups/pre-deploy-TIMESTAMP.sql
4. git pull origin <branch>
5. Run database migrations: docker compose exec api npx prisma migrate deploy
6. Rebuild and restart: docker compose build && docker compose up -d
7. Wait for health check (up to 30 seconds, polling every 2s)
8. If healthy: log success, clean up old pre-deploy backups (keep last 5)
9. If unhealthy: ROLLBACK
a. docker compose down
b. git checkout <previous-commit>
c. Restore pre-deploy backup: psql < /opt/backups/pre-deploy-TIMESTAMP.sql
d. docker compose build && docker compose up -d
e. Verify health
f. Log failure and rollback

File Location

infra/deploy.sh

Environment Detection

  • Dev: pulls from develop branch
  • Prod: pulls from main branch
  • Determined by argument or by reading NODE_ENV from .env

Acceptance Criteria

  • ./deploy.sh dev deploys the develop branch to dev environment
  • ./deploy.sh prod deploys the main branch to prod environment
  • Pre-deploy pg_dump is created before any changes
  • Database migrations run successfully
  • Health check verifies the new deployment is working
  • Failed health check triggers automatic rollback
  • After rollback, the previous version is running and healthy
  • Script exits with 0 on success, 1 on failure (after rollback)
  • Script logs all steps to stdout for CI/CD visibility

Test Cases

  1. Happy path: Deploy with passing health check, verify new version running
  2. Rollback: Deploy with broken code, verify rollback restores previous version
  3. Migration: Deploy with schema changes, verify migrations applied
  4. Idempotent: Run deploy twice with no changes, no errors

Component 6: CI/CD Pipeline (GitHub Actions)

Scope

GitHub Actions workflow triggered by pushes to main and develop. Runs lint, test, then deploys to the appropriate environment via SSH.

Workflow: .github/workflows/api-ci.yml

Trigger

on:
push:
branches: [main, develop]
paths:
- 'apps/api/**'
- 'infra/**'
- '.github/workflows/api-ci.yml'

Jobs

  1. lint-and-test (runs on ubuntu-latest):

    • Checkout code
    • Setup Node.js 22
    • Install dependencies (npm ci in apps/api/)
    • Start Postgres service container
    • Run Prisma migrations against test DB
    • Run linter (npm run lint)
    • Run tests (npm test)
  2. deploy (needs: lint-and-test):

    • Determine target: main → prod, develop → dev
    • SSH into target server
    • Run deploy.sh on server
    • Verify health check from CI runner

Secrets Required

  • DEV_SSH_KEY, DEV_HOST
  • PROD_SSH_KEY, PROD_HOST
  • DEPLOY_USER

Acceptance Criteria

  • Push to develop triggers: lint → test → deploy to dev
  • Push to main triggers: lint → test → deploy to prod
  • Lint or test failure prevents deployment
  • CI uses Postgres service container for tests
  • SSH deploy uses key from GitHub secrets
  • Workflow only triggers on apps/api/** or infra/** changes
  • Non-API changes (docs, iOS) do NOT trigger the workflow

Test Cases

  1. Develop push: Push to develop, verify dev server updated
  2. Main push: Push to main, verify prod server updated
  3. Test failure: Push with failing test, verify deploy is skipped
  4. Path filter: Push docs-only change, verify workflow doesn't run

Component 7: Nginx Proxy Hosts

Scope

Add two proxy hosts in the existing Nginx Proxy Manager instance.

Configuration

Proxy HostUpstreamSSL
dev.baby.bretzfam.com<dev-lxc-ip>:3000Let's Encrypt auto
baby.bretzfam.com<prod-lxc-ip>:3000Let's Encrypt auto

Notes

  • This is a GUI operation in Nginx Proxy Manager (no config files to write)
  • DNS A records must point both subdomains to the NPM host
  • SSL certificates auto-provisioned via Let's Encrypt
  • Websocket support: OFF (not needed for MVP REST API)
  • Custom locations: none

Acceptance Criteria

  • https://dev.baby.bretzfam.com/api/v1/health returns 200
  • https://baby.bretzfam.com/api/v1/health returns 200
  • SSL certificates are valid (no browser warnings)
  • HTTP redirects to HTTPS

Test Cases

  1. Dev routing: curl https://dev.baby.bretzfam.com/api/v1/health returns health response
  2. Prod routing: curl https://baby.bretzfam.com/api/v1/health returns health response
  3. SSL: curl -I https://baby.bretzfam.com shows valid certificate
  4. HTTP redirect: curl -I http://baby.bretzfam.com returns 301 to HTTPS

Component 8: Backup + Rollback Scripts

Backup Script (infra/backup.sh)

  • Runs via cron daily at 3:00 AM
  • pg_dump from the Docker Postgres container
  • Output: /opt/backups/daily-YYYY-MM-DD.sql
  • Retention: 30 days (delete older backups)
  • Logs to /opt/backups/backup.log
  • Exit code 0 on success, 1 on failure

Cron Entry

0 3 * * * /opt/baby-basics/infra/backup.sh >> /opt/backups/backup.log 2>&1

Rollback Script (infra/rollback.sh)

  • One-command rollback to previous deployment
  • Reads the most recent pre-deploy backup from /opt/backups/
  • Stops containers, restores DB, checks out previous git commit, rebuilds, restarts
  • Verifies health after rollback

Acceptance Criteria

  • backup.sh creates a valid SQL dump file
  • Dump file can be restored successfully: psql < dump.sql
  • Backups older than 30 days are automatically deleted
  • Cron job runs daily at 3 AM
  • rollback.sh reverts to the previous deployment
  • After rollback, health check passes
  • Both scripts log their actions

Test Cases

  1. Backup creates valid dump: Run backup, then restore into a fresh DB and verify data
  2. Retention: Create fake old backups, run backup, verify old ones deleted
  3. Rollback: Deploy new version, run rollback, verify previous version restored
  4. Backup with data: Insert test data, backup, verify dump contains it

Component 9: Uptime Kuma Monitoring

Scope

Set up Uptime Kuma to monitor the production health check endpoint.

Configuration

  • Monitor type: HTTP(s)
  • URL: https://baby.bretzfam.com/api/v1/health
  • Check interval: 30 seconds
  • Retry: 3 attempts before marking down
  • Expected status: 200
  • Alert: Notification on downtime (method TBD - could be email, Telegram, Pushover, etc.)

Notes

  • Uptime Kuma runs as part of the prod Docker Compose stack OR as a separate service (user preference)
  • Only monitors production (dev is not critical)

Acceptance Criteria

  • Uptime Kuma is running and accessible
  • Production health endpoint is monitored every 30 seconds
  • Downtime triggers an alert notification
  • Dashboard shows uptime history

Test Cases

  1. Normal operation: Health check passing, Uptime Kuma shows "Up"
  2. Downtime detection: Stop API container, verify Uptime Kuma detects downtime within 90 seconds
  3. Recovery detection: Restart API container, verify Uptime Kuma shows recovery

Component 10: End-to-End Infrastructure Verification

Scope

Verify the complete infrastructure works together. This is a manual checklist, not automated.

Verification Checklist

  • Push a commit to develop → CI runs → dev server deploys → dev.baby.bretzfam.com/api/v1/health returns 200
  • Push a commit to main → CI runs → prod server deploys → baby.bretzfam.com/api/v1/health returns 200
  • Run backup.sh on prod → valid SQL dump created
  • Run rollback.sh on dev → previous version restored and healthy
  • Uptime Kuma shows prod as "Up"
  • Stop prod API → Uptime Kuma alerts → restart API → Uptime Kuma shows recovery
  • Dev and prod databases contain different data (isolation verified)

Boundaries

  • This spec does NOT cover application features (auth, feedings, etc.)
  • This spec does NOT cover iOS build/deploy (manual TestFlight for MVP)
  • Proxmox host management (updates, firewalls) is out of scope
  • DNS configuration is assumed to be handled separately
  • The health check endpoint is the ONLY application code in this phase

Dependencies

  • Proxmox host must be accessible
  • DNS records for baby.bretzfam.com and dev.baby.bretzfam.com must be configured
  • Nginx Proxy Manager must be running and accessible
  • GitHub account with repo creation permissions