Feature: Infrastructure

Version: 1.0.0 Last Reviewed: 2026-02-10 Status: Approved

User Story

As a developer, I have rock-solid dev and prod environments with automated deployments, backups, and rollback capability so I can ship features confidently and recover from failures quickly.

Overview

Two completely isolated environments (dev + prod) running on Proxmox LXC containers. Each runs the Baby Basics API and PostgreSQL via Docker Compose. Deployments are automated via GitHub Actions. Data is protected by daily backups and a one-command rollback script.

Architecture

GitHub Actions (CI/CD)
  │
  ├── push to develop ──→ SSH ──→ dev LXC container
  │                                 ├── Docker: baby-basics-api
  │                                 └── Docker: postgres
  │
  └── push to main ────→ SSH ──→ prod LXC container
                                  ├── Docker: baby-basics-api
                                  ├── Docker: postgres
                                  └── Uptime Kuma (monitoring)

Nginx Proxy Manager (existing, separate host)
  ├── dev.baby.bretzfam.com ──→ dev LXC:3000
  └── baby.bretzfam.com ──────→ prod LXC:3000

Component 1: GitHub Repository + Git Flow

Scope

Create private GitHub repo baby-basics
Push existing scaffolding as initial commit on main
Create develop branch from main
Configure branch protection rules
Set up repository secrets for CI/CD

Configuration

Repo: github.com/<owner>/baby-basics (private)
Default branch: main
Branch protection on main: require PR (no direct push), require CI to pass
Branch protection on develop: require CI to pass
Feature branches: bb-<forge-id>-<short-desc> from develop

GitHub Actions Secrets (to be added later when LXC containers are ready)

DEV_SSH_KEY - Private SSH key for dev LXC
DEV_HOST - Dev LXC IP/hostname
PROD_SSH_KEY - Private SSH key for prod LXC
PROD_HOST - Prod LXC IP/hostname
DEPLOY_USER - SSH user on both LXC containers

Acceptance Criteria

Private repo exists on GitHub
main branch has initial scaffolding commit
develop branch exists and tracks main
Branch protection rules are configured
.github/workflows/ directory exists with placeholder workflow

Test Cases

Verify branches: git branch -r shows origin/main and origin/develop
Verify protection: Direct push to main is rejected (after protection is enabled)
Verify scaffold: Cloning the repo produces the full monorepo directory structure

Component 2: LXC Container Provisioning

Scope

Create two LXC containers on Proxmox: baby-basics-dev and baby-basics-prod
Install Docker and Docker Compose on each
Configure networking (static IPs or DHCP reservations)
Create deployment user with SSH key access
Create application directory structure

Per-Container Setup

OS: Debian 12 (bookworm) or Ubuntu 24.04 LTS
Resources: 2 CPU cores, 2GB RAM, 20GB disk (adjustable)
Packages: docker.io, docker-compose-plugin, git, curl, postgresql-client (for pg_dump)
User: deploy with sudo access, SSH key auth (no password)
App directory: /opt/baby-basics/
Backup directory: /opt/backups/
Environment file: /opt/baby-basics/.env (gitignored, manually created)

.env File Template (per environment)

# /opt/baby-basics/.env
DATABASE_URL="postgresql://baby:STRONG_PASSWORD_HERE@db:5432/baby_basics?schema=public"
JWT_SECRET="RANDOM_64_CHAR_STRING_HERE"
NODE_ENV="production"  # or "development" for dev
PORT=3000

Acceptance Criteria

Dev LXC container is running on Proxmox
Prod LXC container is running on Proxmox
Docker is installed and running on both containers
deploy user can SSH to both containers with key auth
/opt/baby-basics/ and /opt/backups/ directories exist on both
.env file exists on both with correct values
Containers can reach the internet (for Docker image pulls)
Dev and prod are on separate IPs (completely isolated)

Test Cases

SSH access: ssh deploy@<dev-ip> and ssh deploy@<prod-ip> succeed
Docker: docker run hello-world succeeds on both
Isolation: Dev and prod have different IPs, different DB passwords

Component 3: Docker Compose Configs

Scope

Shared base docker-compose.yml with common configuration
docker-compose.dev.yml override for dev-specific settings
docker-compose.prod.yml override for prod-specific settings

Services

API service (baby-basics-api):

Build from apps/api/Dockerfile
Port: 3000 (mapped to host)
Depends on: db
Restart: unless-stopped
Environment from .env file
Health check: curl -f http://localhost:3000/api/v1/health

Database service (db):

Image: postgres:16-alpine
Volume: pgdata (named volume for persistence)
Port: 5432 (NOT exposed to host in prod, exposed in dev for debugging)
Environment: POSTGRES_USER, POSTGRES_PASSWORD, POSTGRES_DB from .env
Health check: pg_isready
Restart: unless-stopped

Dev Overrides

Postgres port exposed to host (5432:5432) for direct DB access
API logs at debug level
No resource limits

Prod Overrides

Postgres port NOT exposed to host
API logs at info level
Container restart policy: unless-stopped
Resource limits: API 1GB RAM, Postgres 1GB RAM

File Locations

infra/
├── docker-compose.yml          # Shared base
├── docker-compose.dev.yml      # Dev overrides
└── docker-compose.prod.yml     # Prod overrides

Acceptance Criteria

docker compose -f docker-compose.yml -f docker-compose.dev.yml up starts API + Postgres on dev
docker compose -f docker-compose.yml -f docker-compose.prod.yml up starts API + Postgres on prod
Postgres data persists across container restarts (named volume)
API connects to Postgres and responds on port 3000
Dev exposes Postgres port 5432, prod does not

Test Cases

Start stack: docker compose up -d succeeds, both containers healthy
Persistence: Write data, restart containers, data survives
Connectivity: API can query Postgres (/api/v1/health returns db: "connected")
Isolation: Prod Postgres not reachable from host on 5432

Component 4: Health Check Endpoint

Scope

Minimal Fastify application that serves a single health check endpoint. This is the first actual code written for the API - just enough to verify the infrastructure works end-to-end.

API Contract

GET /api/v1/health

Response 200:
{
  "status": "ok",
  "timestamp": "2026-02-10T12:00:00.000Z",
  "version": "0.1.0",
  "db": "connected"
}

Response 503 (DB unreachable):
{
  "status": "degraded",
  "timestamp": "2026-02-10T12:00:00.000Z",
  "version": "0.1.0",
  "db": "error"
}

Implementation Notes

Minimal Fastify app (just health route, no auth, no other routes)
Tests DB connectivity via prisma.$queryRaw(SELECT 1)
Reads version from package.json
This endpoint is public (no auth required)
Used by Docker health checks, Uptime Kuma, and deploy script verification

Acceptance Criteria

GET /api/v1/health returns 200 when DB is connected
Response includes status, timestamp, version, and db fields
Returns 503 with db: "error" when database is unreachable
Response time <50ms
Endpoint requires no authentication

Test Cases

Healthy: Start with DB running, GET /health returns 200 with db: "connected"
DB down: Start without DB, GET /health returns 503 with db: "error"
Schema: Response matches documented JSON shape exactly

Component 5: Deploy Script

Scope

Single deployment script that handles the full deploy lifecycle with automatic rollback on failure.

Flow

deploy.sh [environment]
  1. Validate environment argument (dev|prod)
  2. cd /opt/baby-basics
  3. Create pre-deploy backup: pg_dump → /opt/backups/pre-deploy-TIMESTAMP.sql
  4. git pull origin <branch>
  5. Run database migrations: docker compose exec api npx prisma migrate deploy
  6. Rebuild and restart: docker compose build && docker compose up -d
  7. Wait for health check (up to 30 seconds, polling every 2s)
  8. If healthy: log success, clean up old pre-deploy backups (keep last 5)
  9. If unhealthy: ROLLBACK
     a. docker compose down
     b. git checkout <previous-commit>
     c. Restore pre-deploy backup: psql < /opt/backups/pre-deploy-TIMESTAMP.sql
     d. docker compose build && docker compose up -d
     e. Verify health
     f. Log failure and rollback

File Location

infra/deploy.sh

Environment Detection

Dev: pulls from develop branch
Prod: pulls from main branch
Determined by argument or by reading NODE_ENV from .env

Acceptance Criteria

./deploy.sh dev deploys the develop branch to dev environment
./deploy.sh prod deploys the main branch to prod environment
Pre-deploy pg_dump is created before any changes
Database migrations run successfully
Health check verifies the new deployment is working
Failed health check triggers automatic rollback
After rollback, the previous version is running and healthy
Script exits with 0 on success, 1 on failure (after rollback)
Script logs all steps to stdout for CI/CD visibility

Test Cases

Happy path: Deploy with passing health check, verify new version running
Rollback: Deploy with broken code, verify rollback restores previous version
Migration: Deploy with schema changes, verify migrations applied
Idempotent: Run deploy twice with no changes, no errors

Component 6: CI/CD Pipeline (GitHub Actions)

Scope

GitHub Actions workflow triggered by pushes to main and develop. Runs lint, test, then deploys to the appropriate environment via SSH.

Workflow: `.github/workflows/api-ci.yml`

Trigger

on:
  push:
    branches: [main, develop]
    paths:
      - 'apps/api/**'
      - 'infra/**'
      - '.github/workflows/api-ci.yml'

Jobs

lint-and-test (runs on ubuntu-latest):
- Checkout code
- Setup Node.js 22
- Install dependencies (npm ci in apps/api/)
- Start Postgres service container
- Run Prisma migrations against test DB
- Run linter (npm run lint)
- Run tests (npm test)
deploy (needs: lint-and-test):
- Determine target: main → prod, develop → dev
- SSH into target server
- Run deploy.sh on server
- Verify health check from CI runner

Secrets Required

DEV_SSH_KEY, DEV_HOST
PROD_SSH_KEY, PROD_HOST
DEPLOY_USER

Acceptance Criteria

Push to develop triggers: lint → test → deploy to dev
Push to main triggers: lint → test → deploy to prod
Lint or test failure prevents deployment
CI uses Postgres service container for tests
SSH deploy uses key from GitHub secrets
Workflow only triggers on apps/api/** or infra/** changes
Non-API changes (docs, iOS) do NOT trigger the workflow

Test Cases

Develop push: Push to develop, verify dev server updated
Main push: Push to main, verify prod server updated
Test failure: Push with failing test, verify deploy is skipped
Path filter: Push docs-only change, verify workflow doesn't run

Component 7: Nginx Proxy Hosts

Scope

Add two proxy hosts in the existing Nginx Proxy Manager instance.

Configuration

Proxy Host	Upstream	SSL
`dev.baby.bretzfam.com`	`<dev-lxc-ip>:3000`	Let's Encrypt auto
`baby.bretzfam.com`	`<prod-lxc-ip>:3000`	Let's Encrypt auto

Notes

This is a GUI operation in Nginx Proxy Manager (no config files to write)
DNS A records must point both subdomains to the NPM host
SSL certificates auto-provisioned via Let's Encrypt
Websocket support: OFF (not needed for MVP REST API)
Custom locations: none

Acceptance Criteria

https://dev.baby.bretzfam.com/api/v1/health returns 200
https://baby.bretzfam.com/api/v1/health returns 200
SSL certificates are valid (no browser warnings)
HTTP redirects to HTTPS

Test Cases

Dev routing: curl https://dev.baby.bretzfam.com/api/v1/health returns health response
Prod routing: curl https://baby.bretzfam.com/api/v1/health returns health response
SSL: curl -I https://baby.bretzfam.com shows valid certificate
HTTP redirect: curl -I http://baby.bretzfam.com returns 301 to HTTPS

Component 8: Backup + Rollback Scripts

Backup Script (`infra/backup.sh`)

Runs via cron daily at 3:00 AM
pg_dump from the Docker Postgres container
Output: /opt/backups/daily-YYYY-MM-DD.sql
Retention: 30 days (delete older backups)
Logs to /opt/backups/backup.log
Exit code 0 on success, 1 on failure

Cron Entry

0 3 * * * /opt/baby-basics/infra/backup.sh >> /opt/backups/backup.log 2>&1

Rollback Script (`infra/rollback.sh`)

One-command rollback to previous deployment
Reads the most recent pre-deploy backup from /opt/backups/
Stops containers, restores DB, checks out previous git commit, rebuilds, restarts
Verifies health after rollback

Acceptance Criteria

backup.sh creates a valid SQL dump file
Dump file can be restored successfully: psql < dump.sql
Backups older than 30 days are automatically deleted
Cron job runs daily at 3 AM
rollback.sh reverts to the previous deployment
After rollback, health check passes
Both scripts log their actions

Test Cases

Backup creates valid dump: Run backup, then restore into a fresh DB and verify data
Retention: Create fake old backups, run backup, verify old ones deleted
Rollback: Deploy new version, run rollback, verify previous version restored
Backup with data: Insert test data, backup, verify dump contains it

Component 9: Uptime Kuma Monitoring

Scope

Set up Uptime Kuma to monitor the production health check endpoint.

Configuration

Monitor type: HTTP(s)
URL: https://baby.bretzfam.com/api/v1/health
Check interval: 30 seconds
Retry: 3 attempts before marking down
Expected status: 200
Alert: Notification on downtime (method TBD - could be email, Telegram, Pushover, etc.)

Notes

Uptime Kuma runs as part of the prod Docker Compose stack OR as a separate service (user preference)
Only monitors production (dev is not critical)

Acceptance Criteria

Uptime Kuma is running and accessible
Production health endpoint is monitored every 30 seconds
Downtime triggers an alert notification
Dashboard shows uptime history

Test Cases

Normal operation: Health check passing, Uptime Kuma shows "Up"
Downtime detection: Stop API container, verify Uptime Kuma detects downtime within 90 seconds
Recovery detection: Restart API container, verify Uptime Kuma shows recovery

Component 10: End-to-End Infrastructure Verification

Scope

Verify the complete infrastructure works together. This is a manual checklist, not automated.

Verification Checklist

Push a commit to develop → CI runs → dev server deploys → dev.baby.bretzfam.com/api/v1/health returns 200
Push a commit to main → CI runs → prod server deploys → baby.bretzfam.com/api/v1/health returns 200
Run backup.sh on prod → valid SQL dump created
Run rollback.sh on dev → previous version restored and healthy
Uptime Kuma shows prod as "Up"
Stop prod API → Uptime Kuma alerts → restart API → Uptime Kuma shows recovery
Dev and prod databases contain different data (isolation verified)

Boundaries

This spec does NOT cover application features (auth, feedings, etc.)
This spec does NOT cover iOS build/deploy (manual TestFlight for MVP)
Proxmox host management (updates, firewalls) is out of scope
DNS configuration is assumed to be handled separately
The health check endpoint is the ONLY application code in this phase

Dependencies

Proxmox host must be accessible
DNS records for baby.bretzfam.com and dev.baby.bretzfam.com must be configured
Nginx Proxy Manager must be running and accessible
GitHub account with repo creation permissions

User Story​

Overview​

Architecture​

Component 1: GitHub Repository + Git Flow​

Scope​

Configuration​

GitHub Actions Secrets (to be added later when LXC containers are ready)​

Acceptance Criteria​

Test Cases​

Component 2: LXC Container Provisioning​

Scope​

Per-Container Setup​

.env File Template (per environment)​

Acceptance Criteria​

Test Cases​

Component 3: Docker Compose Configs​

Scope​

Services​

Dev Overrides​

Prod Overrides​

File Locations​

Acceptance Criteria​

Test Cases​

Component 4: Health Check Endpoint​

Scope​

API Contract​

Implementation Notes​

Acceptance Criteria​

Test Cases​

Component 5: Deploy Script​

Scope​

Flow​

File Location​

Environment Detection​

Acceptance Criteria​

Test Cases​

Component 6: CI/CD Pipeline (GitHub Actions)​

Scope​

Workflow: .github/workflows/api-ci.yml​

Trigger​

Jobs​

Secrets Required​

Acceptance Criteria​

Test Cases​

Component 7: Nginx Proxy Hosts​

Scope​

Configuration​

Notes​

Acceptance Criteria​

Test Cases​

Component 8: Backup + Rollback Scripts​

Backup Script (infra/backup.sh)​

Cron Entry​

Rollback Script (infra/rollback.sh)​

Acceptance Criteria​

Test Cases​

Component 9: Uptime Kuma Monitoring​

Scope​

Configuration​

Notes​

Acceptance Criteria​

Test Cases​

Component 10: End-to-End Infrastructure Verification​

Scope​

Verification Checklist​

Boundaries​

Dependencies​

User Story

Overview

Architecture

Component 1: GitHub Repository + Git Flow

Scope

Configuration

GitHub Actions Secrets (to be added later when LXC containers are ready)

Acceptance Criteria

Test Cases

Component 2: LXC Container Provisioning

Scope

Per-Container Setup

.env File Template (per environment)

Acceptance Criteria

Test Cases

Component 3: Docker Compose Configs

Scope

Services

Dev Overrides

Prod Overrides

File Locations

Acceptance Criteria

Test Cases

Component 4: Health Check Endpoint

Scope

API Contract

Implementation Notes

Acceptance Criteria

Test Cases

Component 5: Deploy Script

Scope

Flow

File Location

Environment Detection

Acceptance Criteria

Test Cases

Component 6: CI/CD Pipeline (GitHub Actions)

Scope

Workflow: `.github/workflows/api-ci.yml`

Trigger

Jobs

Secrets Required

Acceptance Criteria

Test Cases

Component 7: Nginx Proxy Hosts

Scope

Configuration

Notes

Acceptance Criteria

Test Cases

Component 8: Backup + Rollback Scripts

Backup Script (`infra/backup.sh`)

Cron Entry

Rollback Script (`infra/rollback.sh`)

Acceptance Criteria

Test Cases

Component 9: Uptime Kuma Monitoring

Scope

Configuration

Notes

Acceptance Criteria

Test Cases

Component 10: End-to-End Infrastructure Verification

Scope

Verification Checklist

Boundaries

Dependencies