solstice-ci/docs/ai/decisions/002-qemu-over-libvirt.md
Till Wegmueller d6c2c3662c Add architecture plans and decision records
Plans:
- 001: vm-manager migration (completed)
- 002: runner-only architecture (active)

Decision records (ADRs):
- 001: Runner-only architecture — retire webhooks + logs service
- 002: Direct QEMU over libvirt
- 003: Ephemeral SSH keys with opt-in debug access
- 004: User-mode (SLIRP) networking for VMs
2026-04-09 22:03:12 +02:00

1.9 KiB

ADR-002: Direct QEMU Over Libvirt

Date: 2026-04-07 Status: Accepted Deciders: Till Wegmueller

Context

The orchestrator used libvirt (via the Rust virt crate) for VM lifecycle management. Libvirt provided domain XML generation, network management (virbr0 + dnsmasq), IP discovery (domifaddr), and graceful shutdown. However, it required the libvirt daemon on the host, socket mounts into containers, and complex host configuration.

The vm-manager library manages QEMU processes directly via QMP (QEMU Machine Protocol), eliminating the libvirt middleman.

Decision

Replace libvirt with direct QEMU process management via vm-manager. Use user-mode (SLIRP) networking with SSH port forwarding instead of libvirt's bridged networking.

Consequences

Positive

  • Containerization simplified: only /dev/kvm device needed, no daemon sockets
  • 712 lines of libvirt code removed from orchestrator
  • No libvirt daemon dependency on the host
  • Simpler networking: user-mode SLIRP needs no bridge, no NET_ADMIN, no TAP devices
  • Pure-Rust cloud-init ISO: no genisoimage/mkisofs required (optional pure-iso feature)

Negative

  • No libvirt network management: must use existing bridge or user-mode networking
  • VM IP discovery changes: ip neigh parsing instead of virsh domifaddr
  • QEMU process management: must handle PID tracking, graceful shutdown via QMP
  • Cross-workspace dependency: vm-manager uses workspace-inherited deps, requiring git dep + local patch override

Lessons learned

  • IDE CDROM must be used for cloud-init seed ISO — virtio-blk for both disk and seed confuses Ubuntu's root device detection
  • VmHandle must preserve vcpus/memory from prepare step — vm-manager's start() reads them from the handle
  • SFTP upload needs explicit chmod 0755 for executable files
  • Console tailer must stop before SSH execution begins to prevent log duplication