solstice-ci/docs/ai/plans/001-vm-manager-migration.md

49 lines
2.5 KiB
Markdown
Raw Permalink Normal View History

# Plan: Migrate Orchestrator to vm-manager + Containerize
**Status:** Completed (2026-04-07)
**Planner ID:** `5fc6f5f5-33c1-4e3d-9201-c4c9c4fc43df`
## Summary
Replace the orchestrator's built-in libvirt hypervisor code with the `vm-manager` library crate, then containerize the orchestrator. This eliminates the libvirt dependency and makes deployment straightforward (only `/dev/kvm` needed).
## Motivation
The orchestrator used libvirt (via the `virt` crate) requiring:
- Libvirt daemon on the host
- Libvirt sockets mounted into containers
- KVM device access
- Host-level libvirt configuration and networking
This made containerization painful — the orchestrator ran as a systemd service on the host.
## Approach
1. Extended vm-manager with console log tailing (`console` module)
2. Chose user-mode (SLIRP) networking over TAP for container simplicity
3. Created `vm_adapter.rs` bridging orchestrator's Hypervisor trait to vm-manager
4. Replaced scheduler's SSH/IP-discovery/console code with vm-manager APIs
5. Replaced image download with vm-manager's `ImageManager`
6. Removed 712 lines of libvirt-specific code
7. Updated Containerfile: libvirt packages replaced with QEMU + qemu-utils
## Tasks completed
| # | Task | Summary |
|---|------|---------|
| 1 | Add serial console tailing to vm-manager | `ConsoleTailer` for async Unix socket streaming |
| 2 | Verify networking | User-mode SLIRP chosen — no bridge needed |
| 3 | Add vm-manager adapter layer | `vm_adapter.rs` with VmSpec/VmHandle conversion |
| 4 | Update scheduler SSH + console | vm-manager SSH/connect_with_retry/upload/exec |
| 5 | Update image config | vm-manager `ImageManager::download()` |
| 6 | Remove libvirt dependencies | -712 lines, removed virt/ssh2/zstd crates |
| 7 | Update Containerfile | Ubuntu 24.04 runtime, QEMU direct, no libvirt |
| 8 | Integration test | End-to-end job via containerized orchestrator |
## Key decisions
- **QEMU direct over libvirt**: vm-manager spawns QEMU processes directly, manages via QMP socket. Simpler, no daemon dependency.
- **User-mode networking**: SSH via port forwarding (`hostfwd=tcp::{port}-:22`). No bridge, no NET_ADMIN, no TAP device creation.
- **IDE CDROM for seed ISO**: Ubuntu cloud images expect root disk as first virtio device. Seed ISO uses IDE CDROM to avoid device ordering conflicts.
- **Pre-built binary Containerfile**: vm-manager uses workspace-inherited deps making cross-workspace path deps difficult. Git dep used for CI, local patch for dev.