solstice-ci/docs/ai/plans/001-vm-manager-migration.md
Till Wegmueller d6c2c3662c Add architecture plans and decision records
Plans:
- 001: vm-manager migration (completed)
- 002: runner-only architecture (active)

Decision records (ADRs):
- 001: Runner-only architecture — retire webhooks + logs service
- 002: Direct QEMU over libvirt
- 003: Ephemeral SSH keys with opt-in debug access
- 004: User-mode (SLIRP) networking for VMs
2026-04-09 22:03:12 +02:00

2.5 KiB

Plan: Migrate Orchestrator to vm-manager + Containerize

Status: Completed (2026-04-07) Planner ID: 5fc6f5f5-33c1-4e3d-9201-c4c9c4fc43df

Summary

Replace the orchestrator's built-in libvirt hypervisor code with the vm-manager library crate, then containerize the orchestrator. This eliminates the libvirt dependency and makes deployment straightforward (only /dev/kvm needed).

Motivation

The orchestrator used libvirt (via the virt crate) requiring:

  • Libvirt daemon on the host
  • Libvirt sockets mounted into containers
  • KVM device access
  • Host-level libvirt configuration and networking

This made containerization painful — the orchestrator ran as a systemd service on the host.

Approach

  1. Extended vm-manager with console log tailing (console module)
  2. Chose user-mode (SLIRP) networking over TAP for container simplicity
  3. Created vm_adapter.rs bridging orchestrator's Hypervisor trait to vm-manager
  4. Replaced scheduler's SSH/IP-discovery/console code with vm-manager APIs
  5. Replaced image download with vm-manager's ImageManager
  6. Removed 712 lines of libvirt-specific code
  7. Updated Containerfile: libvirt packages replaced with QEMU + qemu-utils

Tasks completed

# Task Summary
1 Add serial console tailing to vm-manager ConsoleTailer for async Unix socket streaming
2 Verify networking User-mode SLIRP chosen — no bridge needed
3 Add vm-manager adapter layer vm_adapter.rs with VmSpec/VmHandle conversion
4 Update scheduler SSH + console vm-manager SSH/connect_with_retry/upload/exec
5 Update image config vm-manager ImageManager::download()
6 Remove libvirt dependencies -712 lines, removed virt/ssh2/zstd crates
7 Update Containerfile Ubuntu 24.04 runtime, QEMU direct, no libvirt
8 Integration test End-to-end job via containerized orchestrator

Key decisions

  • QEMU direct over libvirt: vm-manager spawns QEMU processes directly, manages via QMP socket. Simpler, no daemon dependency.
  • User-mode networking: SSH via port forwarding (hostfwd=tcp::{port}-:22). No bridge, no NET_ADMIN, no TAP device creation.
  • IDE CDROM for seed ISO: Ubuntu cloud images expect root disk as first virtio device. Seed ISO uses IDE CDROM to avoid device ordering conflicts.
  • Pre-built binary Containerfile: vm-manager uses workspace-inherited deps making cross-workspace path deps difficult. Git dep used for CI, local patch for dev.