solstice-ci/deploy/podman
Till Wegmueller 4c5a8567a4
Add webhook crate for extensible signature validation and integration
- Introduce a new `webhook` crate to centralize signature validation for GitHub, Hookdeck, and Forgejo webhooks.
- Enable `github-integration` to perform unified webhook signature verification using the `webhook` crate.
- Refactor `github-integration`: replace legacy HMAC verification with the reusable `webhook` structure.
- Extend Podman configuration for Hookdeck webhook signature handling and improve documentation.
- Clean up unused dependencies by migrating to the new implementation.

Signed-off-by: Till Wegmueller <toasterson@gmail.com>
2026-01-25 22:16:11 +01:00
..
nginx Add Debian packaging support and network configuration enhancements 2025-11-17 19:57:19 +01:00
.env.sample Add webhook crate for extensible signature validation and integration 2026-01-25 22:16:11 +01:00
.gitignore Update container build cache mounts, enhance Traefik and GitHub integration support 2026-01-25 18:38:28 +01:00
compose.yml Add webhook crate for extensible signature validation and integration 2026-01-25 22:16:11 +01:00
README.md Add webhook crate for extensible signature validation and integration 2026-01-25 22:16:11 +01:00

Solstice CI — Production deployment with Podman Compose + Traefik

This stack deploys Solstice CI services behind Traefik with automatic TLS certificates from Lets Encrypt. It uses upstream official images for system services and multi-stage Rust builds on official Rust/Debian images that rely on container layer caching (no sccache) for fast, reproducible builds.

Prerequisites

  • Podman 4.9+ with podman-compose compatibility (podman compose)
  • Public DNS records for subdomains pointing to the host running this stack
  • Ports 80 and 443 open to the Internet (for ACME HTTP-01), see Rootless note below
  • Email address for ACME registration

Rootless Podman note (ports 80/443)

  • Rootless Podman cannot bind privileged ports (<1024). If you run this stack rootless, set high host ports in .env:
    • TRAEFIK_HTTP_PORT=8080
    • TRAEFIK_HTTPS_PORT=4443
  • With high ports, public HTTPS will be served on 4443 and the ACME HTTP-01 challenge will not work unless you forward external port 80 to host 8080 (e.g., via a firewall/NAT) or place another reverse proxy in front.
  • To use real public certificates with HTTP-01 directly on this host, either:
    • Run Podman as root (rootful) for Traefik only, or
    • Allow unprivileged port binding for your kernel by setting (requires root): sysctl -w net.ipv4.ip_unprivileged_port_start=80 and add net.ipv4.ip_unprivileged_port_start=80 to /etc/sysctl.conf to persist.
  • Alternatively, switch Traefik to a DNS-01 challenge (not configured here) if you control DNS.

DNS Create A/AAAA records for the following hostnames under your base domain (no environment in hostname; env separation is logical via DB/vhost/buckets):

  • traefik.svc.DOMAIN
  • api.svc.DOMAIN
  • grpc.svc.DOMAIN
  • runner.svc.DOMAIN
  • forge.svc.DOMAIN (Forge/Forgejo webhooks)
  • github.svc.DOMAIN (GitHub App/webhooks)
  • minio.svc.DOMAIN (console UI)
  • s3.svc.DOMAIN (S3 API, TLS via TCP SNI)
  • mq.svc.DOMAIN (RabbitMQ mgmt UI; AMQP remains internal)

Quick start

  1. Copy env template and edit secrets and settings: cp .env.sample .env

    Edit .env (ENV=staging|prod, DOMAIN, passwords, ACME email)

  2. (Optional) Use Lets Encrypt staging CA to test issuance without rate limits by setting in .env: TRAEFIK_ACME_CASERVER=https://acme-staging-v02.api.letsencrypt.org/directory
  3. Bring up the stack: podman compose -f compose.yml up -d --build
  4. Monitor logs: podman compose logs -f traefik

Services and routing

  • Traefik dashboard: https://traefik.svc.${DOMAIN} (protect with TRAEFIK_DASHBOARD_AUTH in .env)
  • Orchestrator HTTP: https://api.${ENV}.${DOMAIN}
  • Orchestrator gRPC (h2/TLS via SNI): grpc.${ENV}.${DOMAIN}
  • Forge webhooks: https://forge.${ENV}.${DOMAIN}
  • GitHub webhooks: https://github.${ENV}.${DOMAIN}
  • Runner static server: https://runner.${ENV}.${DOMAIN}
  • MinIO console: https://minio.svc.${DOMAIN}
  • S3 API: s3.svc.${DOMAIN}
  • RabbitMQ management: https://mq.svc.${DOMAIN}

Environment scoping (single infra, logical separation)

  • RabbitMQ: single broker; per-environment vhosts named solstice-${ENV} (staging/prod). Services connect to amqp://.../solstice-${ENV}.
  • Postgres: single cluster; databases solstice_staging and solstice_prod are created by the postgres-setup job. Services use postgres://.../solstice_${ENV}.
  • MinIO: single server; buckets solstice-logs-staging and solstice-logs-prod are created by the minio-setup job. Set S3 bucket per service to the env-appropriate bucket.

Security notes

  • Secrets are provided via podman compose secrets referencing your environment variables. Do not commit real secrets.
  • Only management UIs are exposed publicly via Traefik. Data planes (Postgres, AMQP, S3 API) terminate TLS at Traefik and route internally. Adjust exposure policy as needed.

Images and builds

  • System services use Chainguard images (postgres, rabbitmq). MinIO uses upstream images.
  • Rust services are built with multi-stage Containerfiles using cgr.dev/chainguard/rust and run on cgr.dev/chainguard/glibc-dynamic.
  • Build caches are mounted in-build for cargo registry/git and the cargo target directory (via ~/.cargo/config target-dir=/cargo/target).

Maintenance

  • Upgrade images by editing tags in compose.yml and rebuilding: podman compose build --pull
  • Renewals are automatic via Traefik ACME. Certificates are stored in the traefik-acme volume.
  • Backups: persist volumes (postgres-data, rabbitmq-data, minio-data, traefik-acme).

Tear down

  • Stop: podman compose down
  • Remove volumes (DANGEROUS: destroys data): podman volume rm solstice-ci_traefik-acme solstice-ci_postgres-data solstice-ci_rabbitmq-data solstice-ci_minio-data

Troubleshooting

  • Certificate issues: check Traefik logs; verify DNS and ports 80/443. For testing, use ACME staging server.
  • No routes: verify labels on services and that traefik sees the podman socket.
  • Healthchecks failing: inspect service logs with podman logs .
  • Arch Linux/Podman DNS timeouts (ACME): If Traefik logs show errors like "dial tcp: lookup acme-v02.api.letsencrypt.org on 10.89.0.1:53: i/o timeout", this is typically a Podman network DNS (netavark/aardvark-dns) issue. Fixes:
    • We now set explicit public DNS resolvers for the Traefik container in compose.yml (1.1.1.1, 8.8.8.8, 9.9.9.9). Redeploy: podman compose up -d traefik.
    • Ensure Podmans network backend and DNS are installed and active (Arch): pacman -S netavark aardvark-dns; systemctl enable --now aardvark-dns.socket; verify podman info | grep -i network shows networkBackend: netavark.
    • Alternatively, mount the host resolv.conf into Traefik: add to the traefik service volumes: - /etc/resolv.conf:/etc/resolv.conf:ro
    • Check firewall (nftables): allow UDP/TCP 53 from the Podman bridge (e.g., 10.89.0.0/24) to host 10.89.0.1; allow FORWARD for ESTABLISHED,RELATED.
    • Inspect network: podman network inspect podman; consider creating a custom network with explicit DNS servers: podman network create --dns 1.1.1.1 --dns 8.8.8.8 solstice-net and set networks.core.name to that network in compose.yml.
    • As a last resort, run Traefik with host networking: network_mode: host (then remove ports and ensure only Traefik is exposed), or switch ACME to DNS-01.

Ubuntu host setup for libvirt/KVM and image directories

These steps prepare an Ubuntu host so the orchestrator (running in a container) can control KVM/libvirt and manage VM images stored on the host.

  1. Install libvirt/KVM and tools
  • sudo apt update
  • sudo apt install -y qemu-kvm libvirt-daemon-system libvirt-clients virtinst bridge-utils genisoimage
  • Ensure the libvirt service is running:
    • systemctl status libvirtd
    • If inactive: sudo systemctl enable --now libvirtd
  1. User permissions (KVM and libvirt sockets)
  • Add your deployment user (the one running podman compose) to the required groups:
    • sudo usermod -aG libvirt $USER
    • sudo usermod -aG kvm $USER
  • Log out and back in (or new shell) for group membership to take effect.
  1. Default libvirt network
  • Make sure the default network exists and is active (compose defaults LIBVIRT_NETWORK=default):
    • virsh net-list --all
    • If missing, define it from the stock XML or create a new NAT network.
    • If present but inactive:
      • virsh net-start default
      • virsh net-autostart default
  1. Prepare host directories for images and work data
  • Base images directory (bind-mounted read/write into the orchestrator container):
    • sudo mkdir -p /var/lib/solstice/images
    • sudo chown "$USER":"$USER" /var/lib/solstice/images
  • Orchestrator work directory for overlays and console logs:
    • sudo mkdir -p /var/lib/solstice-ci
    • sudo chown "$USER":"$USER" /var/lib/solstice-ci
  • In deploy/podman/.env(.sample), set:
    • ORCH_IMAGES_DIR=/var/lib/solstice/images
    • ORCH_WORK_DIR=/var/lib/solstice-ci
  1. Map the image list (image map YAML)
  • Point ORCH_IMAGE_MAP_PATH at your production image map on the host (kept in git or ops repo):
    • ORCH_IMAGE_MAP_PATH=/etc/solstice/orchestrator-image-map.yaml
  • The orchestrator looks for /examples/orchestrator-image-map.yaml in the container; compose binds your host file there read-only.
  • Ensure each images[*].local_path in the YAML points inside /var/lib/solstice/images (the in-container path is the same via the bind mount). The provided example already uses that prefix.
  1. Bring up the stack
  • podman compose -f compose.yml up -d --build
  • The orchestrator will, on first start, download missing base images as per the YAML into ORCH_IMAGES_DIR. Subsequent starts reuse the same files.

Notes

  • Hardware acceleration: compose maps /dev/kvm into the container; verify kvm is available on the host: lsmod | grep kvm and that your CPU virtualization features are enabled in BIOS/UEFI.
  • Sockets and configs: compose binds libvirt control sockets and common libvirt directories read-only so the orchestrator can read network definitions and create domains.
  • If you change LIBVIRT_URI or LIBVIRT_NETWORK, update deploy/podman/.env and redeploy.

Runner binaries (served by the orchestrator)

  • Purpose: Builder VMs download workflow runner binaries from the orchestrator over HTTP.
  • Host directory: Set RUNNER_DIR_HOST in deploy/podman/.env. This path is bind-mounted read-only into the orchestrator at /runners.
    • Example (prod default in .env): RUNNER_DIR_HOST=/var/lib/solstice/runners
    • Example (dev default in .env.sample): RUNNER_DIR_HOST=../../target/runners
  • URLs: Files are served at http(s)://runner.${ENV}.${DOMAIN}/runners/{filename}
    • Example: https://runner.prod.${DOMAIN}/runners/solstice-runner-linux
  • Orchestrator injection: The orchestrator auto-computes default runner URLs from its HTTP_ADDR and contact address and injects them into cloud-init.
    • You can override via env: SOLSTICE_RUNNER_URL (single) and SOLSTICE_RUNNER_URLS (space-separated list) to point VMs at specific filenames.
  • To build/place binaries:
    • Build the workflow-runner crate for your target(s) and place the resulting artifacts in RUNNER_DIR_HOST with stable filenames (e.g., solstice-runner-linux, solstice-runner-illumos).
    • Ensure file permissions allow read by the orchestrator user (world-readable is fine for static serving).
  • Traefik routing: runner.${ENV}.${DOMAIN} routes to the orchestrators HTTP port (8081 by default).

Forge integration configuration

  • The forge-integration service will warn if WEBHOOK_SECRET is not set: it will accept webhooks without signature validation (dev mode). Set WEBHOOK_SECRET in deploy/podman/.env to enable HMAC validation.
  • To enable posting commit statuses back to Forgejo/Gitea, set FORGEJO_TOKEN and FORGEJO_BASE_URL in deploy/podman/.env. If they are not set, the service logs a warning (FORGEJO_* not set) and disables the job result consumer that reports statuses.
  • The compose file passes these variables to the container. After editing .env, run: podman compose up -d forge-integration

GitHub integration configuration

  • Set GITHUB_WEBHOOK_SECRET in deploy/podman/.env to validate webhook signatures (X-Hub-Signature-256). If unset, webhooks are accepted without validation (dev mode).
  • If you proxy webhooks through Hookdeck, set HOOKDECK_SIGNING_SECRET to validate Hookdeck's signature header (X-Hookdeck-Signature). Either GitHub or Hookdeck signatures can satisfy verification.
  • To enable check runs and workflow fetches, configure a GitHub App and set GITHUB_APP_ID plus either GITHUB_APP_KEY (PEM contents) or GITHUB_APP_KEY_PATH (path inside the container).
  • Optional overrides: GITHUB_API_BASE for GitHub Enterprise and GITHUB_CHECK_NAME to customize the check run title.
  • The compose file passes these variables to the container. After editing .env, run: podman compose up -d github-integration

Traefik ACME CA server note

  • If you see a warning about TRAEFIK_ACME_CASERVER being unset, it is harmless. The compose file now defaults this value to empty so Traefik uses the production Lets Encrypt endpoint. To test with staging, set TRAEFIK_ACME_CASERVER=https://acme-staging-v02.api.letsencrypt.org/directory in .env and redeploy Traefik.