Plans: - 001: vm-manager migration (completed) - 002: runner-only architecture (active) Decision records (ADRs): - 001: Runner-only architecture — retire webhooks + logs service - 002: Direct QEMU over libvirt - 003: Ephemeral SSH keys with opt-in debug access - 004: User-mode (SLIRP) networking for VMs
1.6 KiB
ADR-004: User-Mode (SLIRP) Networking for VMs
Date: 2026-04-07 Status: Accepted Deciders: Till Wegmueller
Context
The orchestrator needs network access to VMs for SSH (uploading runner binary, executing commands). Two options:
-
TAP with bridge — VM gets a real IP on a bridge network (e.g., virbr0). Requires NET_ADMIN capability, host bridge access, and TAP device creation. IP discovery via ARP/DHCP lease parsing.
-
User-mode (SLIRP) — QEMU provides NAT via user-space networking. VM gets a private IP (10.0.2.x). SSH access via host port forwarding (
hostfwd=tcp::{port}-:22). No special capabilities needed.
Decision
Use user-mode (SLIRP) networking with deterministic SSH port forwarding.
Port assignment: 10022 + (hash(vm_name) % 100) — range 10022-10122.
Guest IP is always 127.0.0.1 from the orchestrator's perspective.
Consequences
Positive
- Container-friendly: no NET_ADMIN, no bridge access, no host configuration
- Trivial IP discovery: always
127.0.0.1with a known port - No host bridge dependency: works on any host with just
/dev/kvm - Network isolation: VMs cannot reach each other or the host network directly
Negative
- Port collision risk: with 100 ports and concurrent VMs, hash collisions are possible (mitigated by UUID-based VM names having good hash distribution)
- No inbound connections: external services cannot reach the VM directly (not needed for CI)
- SLIRP performance: slightly slower than TAP for network-heavy workloads (acceptable for CI)
- No VM-to-VM communication: VMs are fully isolated (acceptable for CI)