solstice-ci/docs/ai/decisions/002-qemu-over-libvirt.md
Till Wegmueller d6c2c3662c Add architecture plans and decision records
Plans:
- 001: vm-manager migration (completed)
- 002: runner-only architecture (active)

Decision records (ADRs):
- 001: Runner-only architecture — retire webhooks + logs service
- 002: Direct QEMU over libvirt
- 003: Ephemeral SSH keys with opt-in debug access
- 004: User-mode (SLIRP) networking for VMs
2026-04-09 22:03:12 +02:00

36 lines
1.9 KiB
Markdown

# ADR-002: Direct QEMU Over Libvirt
**Date:** 2026-04-07
**Status:** Accepted
**Deciders:** Till Wegmueller
## Context
The orchestrator used libvirt (via the Rust `virt` crate) for VM lifecycle management. Libvirt provided domain XML generation, network management (virbr0 + dnsmasq), IP discovery (domifaddr), and graceful shutdown. However, it required the libvirt daemon on the host, socket mounts into containers, and complex host configuration.
The `vm-manager` library manages QEMU processes directly via QMP (QEMU Machine Protocol), eliminating the libvirt middleman.
## Decision
Replace libvirt with direct QEMU process management via vm-manager. Use user-mode (SLIRP) networking with SSH port forwarding instead of libvirt's bridged networking.
## Consequences
### Positive
- **Containerization simplified**: only `/dev/kvm` device needed, no daemon sockets
- **712 lines of libvirt code removed** from orchestrator
- **No libvirt daemon dependency** on the host
- **Simpler networking**: user-mode SLIRP needs no bridge, no NET_ADMIN, no TAP devices
- **Pure-Rust cloud-init ISO**: no genisoimage/mkisofs required (optional `pure-iso` feature)
### Negative
- **No libvirt network management**: must use existing bridge or user-mode networking
- **VM IP discovery changes**: `ip neigh` parsing instead of `virsh domifaddr`
- **QEMU process management**: must handle PID tracking, graceful shutdown via QMP
- **Cross-workspace dependency**: vm-manager uses workspace-inherited deps, requiring git dep + local patch override
### Lessons learned
- IDE CDROM must be used for cloud-init seed ISO — virtio-blk for both disk and seed confuses Ubuntu's root device detection
- VmHandle must preserve vcpus/memory from prepare step — vm-manager's start() reads them from the handle
- SFTP upload needs explicit chmod 0755 for executable files
- Console tailer must stop before SSH execution begins to prevent log duplication