mirror of
https://codeberg.org/Toasterson/solstice-ci.git
synced 2026-04-10 21:30:41 +00:00
- Document rootless Podman port binding limitations and workarounds in README. - Update `.env.sample` with notes and default high ports for rootless runs. - Adjust `compose.yml` for network configuration and privileged port handling. - Introduce fixes for Traefik DNS timeouts using explicit public resolvers and network tweaks. - Switch MinIO and MinIO setup to use the latest images for better compatibility.
90 lines
6 KiB
Markdown
90 lines
6 KiB
Markdown
Solstice CI — Production deployment with Podman Compose + Traefik
|
||
|
||
This stack deploys Solstice CI services behind Traefik with automatic TLS certificates from Let’s Encrypt. It uses upstream official images for system services and multi-stage Rust builds on official Rust/Debian images that rely on container layer caching (no sccache) for fast, reproducible builds.
|
||
|
||
Prerequisites
|
||
- Podman 4.9+ with podman-compose compatibility (podman compose)
|
||
- Public DNS records for subdomains pointing to the host running this stack
|
||
- Ports 80 and 443 open to the Internet (for ACME HTTP-01), see Rootless note below
|
||
- Email address for ACME registration
|
||
|
||
Rootless Podman note (ports 80/443)
|
||
- Rootless Podman cannot bind privileged ports (<1024). If you run this stack rootless, set high host ports in .env:
|
||
- TRAEFIK_HTTP_PORT=8080
|
||
- TRAEFIK_HTTPS_PORT=4443
|
||
- With high ports, public HTTPS will be served on 4443 and the ACME HTTP-01 challenge will not work unless you forward external port 80 to host 8080 (e.g., via a firewall/NAT) or place another reverse proxy in front.
|
||
- To use real public certificates with HTTP-01 directly on this host, either:
|
||
- Run Podman as root (rootful) for Traefik only, or
|
||
- Allow unprivileged port binding for your kernel by setting (requires root):
|
||
sysctl -w net.ipv4.ip_unprivileged_port_start=80
|
||
and add net.ipv4.ip_unprivileged_port_start=80 to /etc/sysctl.conf to persist.
|
||
- Alternatively, switch Traefik to a DNS-01 challenge (not configured here) if you control DNS.
|
||
|
||
DNS
|
||
Create A/AAAA records for the following hostnames under your base domain (no environment in hostname; env separation is logical via DB/vhost/buckets):
|
||
- traefik.svc.DOMAIN
|
||
- api.svc.DOMAIN
|
||
- grpc.svc.DOMAIN
|
||
- runner.svc.DOMAIN
|
||
- forge.svc.DOMAIN (Forge/Forgejo webhooks)
|
||
- github.svc.DOMAIN (GitHub App/webhooks)
|
||
- minio.svc.DOMAIN (console UI)
|
||
- s3.svc.DOMAIN (S3 API, TLS via TCP SNI)
|
||
- mq.svc.DOMAIN (RabbitMQ mgmt UI; AMQP remains internal)
|
||
|
||
Quick start
|
||
1. Copy env template and edit secrets and settings:
|
||
cp .env.sample .env
|
||
# Edit .env (ENV=staging|prod, DOMAIN, passwords, ACME email)
|
||
2. (Optional) Use Let’s Encrypt staging CA to test issuance without rate limits by setting in .env:
|
||
TRAEFIK_ACME_CASERVER=https://acme-staging-v02.api.letsencrypt.org/directory
|
||
3. Bring up the stack:
|
||
podman compose -f compose.yml up -d --build
|
||
4. Monitor logs:
|
||
podman compose logs -f traefik
|
||
|
||
Services and routing
|
||
- Traefik dashboard: https://traefik.svc.${DOMAIN} (protect with TRAEFIK_DASHBOARD_AUTH in .env)
|
||
- Orchestrator HTTP: https://api.svc.${DOMAIN}
|
||
- Orchestrator gRPC (h2/TLS via SNI): grpc.svc.${DOMAIN}
|
||
- Forge webhooks: https://forge.svc.${DOMAIN}
|
||
- GitHub webhooks: https://github.svc.${DOMAIN}
|
||
- Runner static server: https://runner.svc.${DOMAIN}
|
||
- MinIO console: https://minio.svc.${DOMAIN}
|
||
- S3 API: s3.svc.${DOMAIN}
|
||
- RabbitMQ management: https://mq.svc.${DOMAIN}
|
||
|
||
Environment scoping (single infra, logical separation)
|
||
- RabbitMQ: single broker; per-environment vhosts named solstice-${ENV} (staging/prod). Services connect to amqp://.../solstice-${ENV}.
|
||
- Postgres: single cluster; databases solstice_staging and solstice_prod are created by the postgres-setup job. Services use postgres://.../solstice_${ENV}.
|
||
- MinIO: single server; buckets solstice-logs-staging and solstice-logs-prod are created by the minio-setup job. Set S3 bucket per service to the env-appropriate bucket.
|
||
|
||
Security notes
|
||
- Secrets are provided via podman compose secrets referencing your environment variables. Do not commit real secrets.
|
||
- Only management UIs are exposed publicly via Traefik. Data planes (Postgres, AMQP, S3 API) terminate TLS at Traefik and route internally. Adjust exposure policy as needed.
|
||
|
||
Images and builds
|
||
- System services use Chainguard images (postgres, rabbitmq). MinIO uses upstream images.
|
||
- Rust services are built with multi-stage Containerfiles using cgr.dev/chainguard/rust and run on cgr.dev/chainguard/glibc-dynamic.
|
||
- Build caches are mounted in-build for cargo registry/git and the cargo target directory (via ~/.cargo/config target-dir=/cargo/target).
|
||
|
||
Maintenance
|
||
- Upgrade images by editing tags in compose.yml and rebuilding: podman compose build --pull
|
||
- Renewals are automatic via Traefik ACME. Certificates are stored in the traefik-acme volume.
|
||
- Backups: persist volumes (postgres-data, rabbitmq-data, minio-data, traefik-acme).
|
||
|
||
Tear down
|
||
- Stop: podman compose down
|
||
- Remove volumes (DANGEROUS: destroys data): podman volume rm solstice-ci_traefik-acme solstice-ci_postgres-data solstice-ci_rabbitmq-data solstice-ci_minio-data
|
||
|
||
Troubleshooting
|
||
- Certificate issues: check Traefik logs; verify DNS and ports 80/443. For testing, use ACME staging server.
|
||
- No routes: verify labels on services and that traefik sees the podman socket.
|
||
- Healthchecks failing: inspect service logs with podman logs <container>.
|
||
- Arch Linux/Podman DNS timeouts (ACME): If Traefik logs show errors like "dial tcp: lookup acme-v02.api.letsencrypt.org on 10.89.0.1:53: i/o timeout", this is typically a Podman network DNS (netavark/aardvark-dns) issue. Fixes:
|
||
- We now set explicit public DNS resolvers for the Traefik container in compose.yml (1.1.1.1, 8.8.8.8, 9.9.9.9). Redeploy: podman compose up -d traefik.
|
||
- Ensure Podman’s network backend and DNS are installed and active (Arch): pacman -S netavark aardvark-dns; systemctl enable --now aardvark-dns.socket; verify `podman info | grep -i network` shows networkBackend: netavark.
|
||
- Alternatively, mount the host resolv.conf into Traefik: add to the traefik service volumes: - /etc/resolv.conf:/etc/resolv.conf:ro
|
||
- Check firewall (nftables): allow UDP/TCP 53 from the Podman bridge (e.g., 10.89.0.0/24) to host 10.89.0.1; allow FORWARD for ESTABLISHED,RELATED.
|
||
- Inspect network: podman network inspect podman; consider creating a custom network with explicit DNS servers: podman network create --dns 1.1.1.1 --dns 8.8.8.8 solstice-net and set networks.core.name to that network in compose.yml.
|
||
- As a last resort, run Traefik with host networking: network_mode: host (then remove ports and ensure only Traefik is exposed), or switch ACME to DNS-01.
|