### Solstice CI — Orchestrator Scheduling, Image Map Config, and Libvirt/Zones Backends (MVP) This document summarizes the initial implementation of Orchestrator scheduling, a YAML-based image map configuration, cloud image preparation, and a hypervisor abstraction with Linux/KVM (libvirt) and illumos zones scaffolding. #### What’s included (MVP) - Scheduler and capacity - Global max concurrency (`MAX_CONCURRENCY`, default 2) with backpressure by aligning AMQP `prefetch` to concurrency. - Optional per-label capacity via `CAPACITY_MAP` (e.g., `illumos-latest=2,ubuntu-22.04=4`). - Ack-on-accept: AMQP message is acked after basic validation and enqueue to scheduler; errors during provisioning are handled internally. - YAML image map configuration - Loaded at startup from `--config` / `ORCH_CONFIG`; defaults to `examples/orchestrator-image-map.yaml`. - Keys: `default_label`, `aliases`, optional `sizes` presets, and `images` map with backend (`zones` or `libvirt`), `source` URL, `local_path`, `decompress` (`zstd` or none), `nocloud` (bool), and per-image default resources. - Default mapping provided: - `default_label: illumos-latest` - Alias: `illumos-latest → openindiana-hipster` - `openindiana-hipster` image points to current OI cloud image: `https://dlc.openindiana.org/isos/hipster/20250402/OI-hipster-cloudimage.img.zstd`, marked `nocloud: true` and `backend: zones`. - Size presets (not yet consumed directly by jobs): `small` (1 CPU, 1 GiB), `medium` (2 CPU, 2 GiB), `large` (4 CPU, 4 GiB). - Image preparation (downloader) - On startup, the orchestrator ensures each configured image exists at `local_path`. - If missing, downloads from `source` and optionally decompresses with Zstd into the target path. - Hypervisor abstraction - `Hypervisor` trait and `RouterHypervisor` dispatcher. - Backends: - `libvirt` (Linux/KVM): skeleton that connects to libvirt in `prepare`; domain XML/overlay/NoCloud seed wiring to follow. - `zones` (illumos/bhyve): stub scaffold (not yet functional); will integrate with `zone` crate + ZFS clones in a follow-up. - `NoopHypervisor` for development on hosts without privileges. - Orchestrator MQ wiring - Consumes `JobRequest` messages and builds `VmSpec` from resolved label and image defaults. - Injects minimal cloud-init user-data content (NoCloud) into the spec for future seeding. #### Configuration (CLI/env) - `--config`, `ORCH_CONFIG` — path to YAML image map (default `examples/orchestrator-image-map.yaml`). - `--max-concurrency`, `MAX_CONCURRENCY` — global VM concurrency (default 2). - `--capacity-map`, `CAPACITY_MAP` — per-label capacity (e.g., `illumos-latest=2,ubuntu-22.04=4`). - AMQP: `AMQP_URL`, `AMQP_EXCHANGE`, `AMQP_QUEUE`, `AMQP_ROUTING_KEY`, `AMQP_PREFETCH` (defaulted to `MAX_CONCURRENCY`). - Libvirt (Linux): `LIBVIRT_URI` (default `qemu:///system`), `LIBVIRT_NETWORK` (default `default`). #### Local usage (dev) 1. Ensure RabbitMQ is running (docker-compose service `rabbitmq`). 2. Start the Orchestrator: ```bash cargo run -p orchestrator -- \ --config examples/orchestrator-image-map.yaml \ --max-concurrency 2 ``` On first run, the OI cloud image will be downloaded and decompressed to the configured `local_path`. 3. In another terminal, enqueue a job (Forge Integration webhook or CLI `enqueue`). The orchestrator will resolve `runs_on` (or default label) and schedule a VM using the configured backend. Note: The current libvirt/zones backends are partial; actual VM boot is a follow-up. The scheduler and config wiring are complete and ready for backend integration. #### What’s next (planned) - Libvirt backend: - Create qcow2 overlays, generate domain XML (virtio devices), attach NoCloud ISO seed, define, start, shutdown, and destroy. - Ensure libvirt default network is active at startup if necessary. - Illumos zones backend: - Integrate `oxidecomputer/zone` and ZFS clone workflow; set bhyve attributes (`vcpus`, `ram`, `bootdisk`), networking, and SMF. - Lifecycle and runner coordination: - gRPC Orchestrator↔Runner for logs/status, job completion handling, and cleanup. - Persistence and recovery: - Store job/VM state in Postgres; graceful recovery on restart. - Tests and docs: - Unit tests for config parsing and scheduler; feature-gated libvirt smoke test; expand docs.