Solstice CI — Production deployment with Podman Compose + Traefik
This stack deploys Solstice CI services behind Traefik with automatic TLS certificates from Let’s Encrypt. It uses upstream official images for system services and multi-stage Rust builds on official Rust/Debian images that rely on container layer caching (no sccache) for fast, reproducible builds.
Prerequisites
- Podman 4.9+ with podman-compose compatibility (podman compose)
- Public DNS records for subdomains pointing to the host running this stack
- With high ports, public HTTPS will be served on 4443 and the ACME HTTP-01 challenge will not work unless you forward external port 80 to host 8080 (e.g., via a firewall/NAT) or place another reverse proxy in front.
- To use real public certificates with HTTP-01 directly on this host, either:
- Run Podman as root (rootful) for Traefik only, or
- Allow unprivileged port binding for your kernel by setting (requires root):
sysctl -w net.ipv4.ip_unprivileged_port_start=80
and add net.ipv4.ip_unprivileged_port_start=80 to /etc/sysctl.conf to persist.
- Alternatively, switch Traefik to a DNS-01 challenge (not configured here) if you control DNS.
- RabbitMQ: single broker; per-environment vhosts named solstice-${ENV} (staging/prod). Services connect to amqp://.../solstice-${ENV}.
- Postgres: single cluster; databases solstice_staging and solstice_prod are created by the postgres-setup job. Services use postgres://.../solstice_${ENV}.
- MinIO: single server; buckets solstice-logs-staging and solstice-logs-prod are created by the minio-setup job. Set S3 bucket per service to the env-appropriate bucket.
Security notes
- Secrets are provided via podman compose secrets referencing your environment variables. Do not commit real secrets.
- Only management UIs are exposed publicly via Traefik. Data planes (Postgres, AMQP, S3 API) terminate TLS at Traefik and route internally. Adjust exposure policy as needed.
Images and builds
- System services use Chainguard images (postgres, rabbitmq). MinIO uses upstream images.
- Rust services are built with multi-stage Containerfiles using cgr.dev/chainguard/rust and run on cgr.dev/chainguard/glibc-dynamic.
- Build caches are mounted in-build for cargo registry/git and the cargo target directory (via ~/.cargo/config target-dir=/cargo/target).
Maintenance
- Upgrade images by editing tags in compose.yml and rebuilding: podman compose build --pull
- Renewals are automatic via Traefik ACME. Certificates are stored in the traefik-acme volume.
- Arch Linux/Podman DNS timeouts (ACME): If Traefik logs show errors like "dial tcp: lookup acme-v02.api.letsencrypt.org on 10.89.0.1:53: i/o timeout", this is typically a Podman network DNS (netavark/aardvark-dns) issue. Fixes:
- We now set explicit public DNS resolvers for the Traefik container in compose.yml (1.1.1.1, 8.8.8.8, 9.9.9.9). Redeploy: podman compose up -d traefik.
- Ensure Podman’s network backend and DNS are installed and active (Arch): pacman -S netavark aardvark-dns; systemctl enable --now aardvark-dns.socket; verify `podman info | grep -i network` shows networkBackend: netavark.
- Alternatively, mount the host resolv.conf into Traefik: add to the traefik service volumes: - /etc/resolv.conf:/etc/resolv.conf:ro
- Check firewall (nftables): allow UDP/TCP 53 from the Podman bridge (e.g., 10.89.0.0/24) to host 10.89.0.1; allow FORWARD for ESTABLISHED,RELATED.
- Inspect network: podman network inspect podman; consider creating a custom network with explicit DNS servers: podman network create --dns 1.1.1.1 --dns 8.8.8.8 solstice-net and set networks.core.name to that network in compose.yml.
- As a last resort, run Traefik with host networking: network_mode: host (then remove ports and ensure only Traefik is exposed), or switch ACME to DNS-01.
- The orchestrator looks for /examples/orchestrator-image-map.yaml in the container; compose binds your host file there read-only.
- Ensure each images[*].local_path in the YAML points inside /var/lib/solstice/images (the in-container path is the same via the bind mount). The provided example already uses that prefix.
6) Bring up the stack
- podman compose -f compose.yml up -d --build
- The orchestrator will, on first start, download missing base images as per the YAML into ORCH_IMAGES_DIR. Subsequent starts reuse the same files.
Notes
- Hardware acceleration: compose maps /dev/kvm into the container; verify kvm is available on the host: lsmod | grep kvm and that your CPU virtualization features are enabled in BIOS/UEFI.
- Sockets and configs: compose binds libvirt control sockets and common libvirt directories read-only so the orchestrator can read network definitions and create domains.
- If you change LIBVIRT_URI or LIBVIRT_NETWORK, update deploy/podman/.env and redeploy.
- Orchestrator injection: The orchestrator auto-computes default runner URLs from its HTTP_ADDR and contact address and injects them into cloud-init.
- You can override via env: SOLSTICE_RUNNER_URL (single) and SOLSTICE_RUNNER_URLS (space-separated list) to point VMs at specific filenames.
- To build/place binaries:
- Build the workflow-runner crate for your target(s) and place the resulting artifacts in RUNNER_DIR_HOST with stable filenames (e.g., solstice-runner-linux, solstice-runner-illumos).
- Ensure file permissions allow read by the orchestrator user (world-readable is fine for static serving).
- Traefik routing: runner.${ENV}.${DOMAIN} routes to the orchestrator’s HTTP port (8081 by default).
- The forge-integration service will warn if WEBHOOK_SECRET is not set: it will accept webhooks without signature validation (dev mode). Set WEBHOOK_SECRET in deploy/podman/.env to enable HMAC validation.
- To enable posting commit statuses back to Forgejo/Gitea, set FORGEJO_TOKEN and FORGEJO_BASE_URL in deploy/podman/.env. If they are not set, the service logs a warning (FORGEJO_* not set) and disables the job result consumer that reports statuses.
- The compose file passes these variables to the container. After editing .env, run: podman compose up -d forge-integration
- Set GITHUB_WEBHOOK_SECRET in deploy/podman/.env to validate webhook signatures (X-Hub-Signature-256). If unset, webhooks are accepted without validation (dev mode).
- To enable check runs and workflow fetches, configure a GitHub App and set GITHUB_APP_ID plus either GITHUB_APP_KEY (PEM contents) or GITHUB_APP_KEY_PATH (path inside the container).
- Optional overrides: GITHUB_API_BASE for GitHub Enterprise and GITHUB_CHECK_NAME to customize the check run title.
- The compose file passes these variables to the container. After editing .env, run: podman compose up -d github-integration
- If you see a warning about TRAEFIK_ACME_CASERVER being unset, it is harmless. The compose file now defaults this value to empty so Traefik uses the production Let’s Encrypt endpoint. To test with staging, set TRAEFIK_ACME_CASERVER=https://acme-staging-v02.api.letsencrypt.org/directory in .env and redeploy Traefik.