solstice-ci/deploy/podman/README.md
Till Wegmueller fe7b4b9ce0
Update Podman deployment for rootless support and DNS fixes
- Document rootless Podman port binding limitations and workarounds in README.
- Update `.env.sample` with notes and default high ports for rootless runs.
- Adjust `compose.yml` for network configuration and privileged port handling.
- Introduce fixes for Traefik DNS timeouts using explicit public resolvers and network tweaks.
- Switch MinIO and MinIO setup to use the latest images for better compatibility.
2025-11-08 21:55:27 +00:00

90 lines
6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

Solstice CI — Production deployment with Podman Compose + Traefik
This stack deploys Solstice CI services behind Traefik with automatic TLS certificates from Lets Encrypt. It uses upstream official images for system services and multi-stage Rust builds on official Rust/Debian images that rely on container layer caching (no sccache) for fast, reproducible builds.
Prerequisites
- Podman 4.9+ with podman-compose compatibility (podman compose)
- Public DNS records for subdomains pointing to the host running this stack
- Ports 80 and 443 open to the Internet (for ACME HTTP-01), see Rootless note below
- Email address for ACME registration
Rootless Podman note (ports 80/443)
- Rootless Podman cannot bind privileged ports (<1024). If you run this stack rootless, set high host ports in .env:
- TRAEFIK_HTTP_PORT=8080
- TRAEFIK_HTTPS_PORT=4443
- With high ports, public HTTPS will be served on 4443 and the ACME HTTP-01 challenge will not work unless you forward external port 80 to host 8080 (e.g., via a firewall/NAT) or place another reverse proxy in front.
- To use real public certificates with HTTP-01 directly on this host, either:
- Run Podman as root (rootful) for Traefik only, or
- Allow unprivileged port binding for your kernel by setting (requires root):
sysctl -w net.ipv4.ip_unprivileged_port_start=80
and add net.ipv4.ip_unprivileged_port_start=80 to /etc/sysctl.conf to persist.
- Alternatively, switch Traefik to a DNS-01 challenge (not configured here) if you control DNS.
DNS
Create A/AAAA records for the following hostnames under your base domain (no environment in hostname; env separation is logical via DB/vhost/buckets):
- traefik.svc.DOMAIN
- api.svc.DOMAIN
- grpc.svc.DOMAIN
- runner.svc.DOMAIN
- forge.svc.DOMAIN (Forge/Forgejo webhooks)
- github.svc.DOMAIN (GitHub App/webhooks)
- minio.svc.DOMAIN (console UI)
- s3.svc.DOMAIN (S3 API, TLS via TCP SNI)
- mq.svc.DOMAIN (RabbitMQ mgmt UI; AMQP remains internal)
Quick start
1. Copy env template and edit secrets and settings:
cp .env.sample .env
# Edit .env (ENV=staging|prod, DOMAIN, passwords, ACME email)
2. (Optional) Use Lets Encrypt staging CA to test issuance without rate limits by setting in .env:
TRAEFIK_ACME_CASERVER=https://acme-staging-v02.api.letsencrypt.org/directory
3. Bring up the stack:
podman compose -f compose.yml up -d --build
4. Monitor logs:
podman compose logs -f traefik
Services and routing
- Traefik dashboard: https://traefik.svc.${DOMAIN} (protect with TRAEFIK_DASHBOARD_AUTH in .env)
- Orchestrator HTTP: https://api.svc.${DOMAIN}
- Orchestrator gRPC (h2/TLS via SNI): grpc.svc.${DOMAIN}
- Forge webhooks: https://forge.svc.${DOMAIN}
- GitHub webhooks: https://github.svc.${DOMAIN}
- Runner static server: https://runner.svc.${DOMAIN}
- MinIO console: https://minio.svc.${DOMAIN}
- S3 API: s3.svc.${DOMAIN}
- RabbitMQ management: https://mq.svc.${DOMAIN}
Environment scoping (single infra, logical separation)
- RabbitMQ: single broker; per-environment vhosts named solstice-${ENV} (staging/prod). Services connect to amqp://.../solstice-${ENV}.
- Postgres: single cluster; databases solstice_staging and solstice_prod are created by the postgres-setup job. Services use postgres://.../solstice_${ENV}.
- MinIO: single server; buckets solstice-logs-staging and solstice-logs-prod are created by the minio-setup job. Set S3 bucket per service to the env-appropriate bucket.
Security notes
- Secrets are provided via podman compose secrets referencing your environment variables. Do not commit real secrets.
- Only management UIs are exposed publicly via Traefik. Data planes (Postgres, AMQP, S3 API) terminate TLS at Traefik and route internally. Adjust exposure policy as needed.
Images and builds
- System services use Chainguard images (postgres, rabbitmq). MinIO uses upstream images.
- Rust services are built with multi-stage Containerfiles using cgr.dev/chainguard/rust and run on cgr.dev/chainguard/glibc-dynamic.
- Build caches are mounted in-build for cargo registry/git and the cargo target directory (via ~/.cargo/config target-dir=/cargo/target).
Maintenance
- Upgrade images by editing tags in compose.yml and rebuilding: podman compose build --pull
- Renewals are automatic via Traefik ACME. Certificates are stored in the traefik-acme volume.
- Backups: persist volumes (postgres-data, rabbitmq-data, minio-data, traefik-acme).
Tear down
- Stop: podman compose down
- Remove volumes (DANGEROUS: destroys data): podman volume rm solstice-ci_traefik-acme solstice-ci_postgres-data solstice-ci_rabbitmq-data solstice-ci_minio-data
Troubleshooting
- Certificate issues: check Traefik logs; verify DNS and ports 80/443. For testing, use ACME staging server.
- No routes: verify labels on services and that traefik sees the podman socket.
- Healthchecks failing: inspect service logs with podman logs <container>.
- Arch Linux/Podman DNS timeouts (ACME): If Traefik logs show errors like "dial tcp: lookup acme-v02.api.letsencrypt.org on 10.89.0.1:53: i/o timeout", this is typically a Podman network DNS (netavark/aardvark-dns) issue. Fixes:
- We now set explicit public DNS resolvers for the Traefik container in compose.yml (1.1.1.1, 8.8.8.8, 9.9.9.9). Redeploy: podman compose up -d traefik.
- Ensure Podmans network backend and DNS are installed and active (Arch): pacman -S netavark aardvark-dns; systemctl enable --now aardvark-dns.socket; verify `podman info | grep -i network` shows networkBackend: netavark.
- Alternatively, mount the host resolv.conf into Traefik: add to the traefik service volumes: - /etc/resolv.conf:/etc/resolv.conf:ro
- Check firewall (nftables): allow UDP/TCP 53 from the Podman bridge (e.g., 10.89.0.0/24) to host 10.89.0.1; allow FORWARD for ESTABLISHED,RELATED.
- Inspect network: podman network inspect podman; consider creating a custom network with explicit DNS servers: podman network create --dns 1.1.1.1 --dns 8.8.8.8 solstice-net and set networks.core.name to that network in compose.yml.
- As a last resort, run Traefik with host networking: network_mode: host (then remove ports and ensure only Traefik is exposed), or switch ACME to DNS-01.