mirror of
https://codeberg.org/Toasterson/solstice-ci.git
synced 2026-04-10 13:20:41 +00:00
4.5 KiB
4.5 KiB
Solstice CI — Orchestrator Scheduling, Image Map Config, and Libvirt Lifecycle (MVP)
This document reflects the current implementation of the Orchestrator: scheduling/capacity, a YAML-based image map, cloud image preparation, and a hypervisor abstraction with a working Linux/KVM (libvirt) backend and an illumos zones scaffold.
What’s included (current status)
- Scheduler and capacity
- Global max concurrency (
MAX_CONCURRENCY, default 2) with backpressure by aligning AMQPprefetchto concurrency. - Optional per-label capacity via
CAPACITY_MAP(e.g.,illumos-latest=2,ubuntu-22.04=4). - Ack-on-accept: AMQP message is acked after basic validation and enqueue to scheduler; errors during provisioning are handled internally.
- Global max concurrency (
- YAML image map configuration (backend-agnostic images)
- Loaded at startup from
--config/ORCH_CONFIG; defaults toexamples/orchestrator-image-map.yaml. - Keys:
default_label,aliases, optionalsizespresets, andimagesmap withsourceURL,local_path,decompress(zstdor none),nocloud(bool), and per-image default resources. - Default mapping provided:
default_label: illumos-latest- Alias:
illumos-latest → openindiana-hipster openindiana-hipsterimage points to current OI cloud image:https://dlc.openindiana.org/isos/hipster/20250402/OI-hipster-cloudimage.img.zstdand is markednocloud: true.
- Size presets (operator convenience):
small(1 CPU, 1 GiB),medium(2 CPU, 2 GiB),large(4 CPU, 4 GiB).
- Loaded at startup from
- Image preparation (downloader)
- On startup, the orchestrator ensures each configured image exists at
local_path. - If missing, downloads from
sourceand optionally decompresses with Zstd into the target path.
- On startup, the orchestrator ensures each configured image exists at
- Hypervisor abstraction
Hypervisortrait andRouterHypervisordispatcher.- Backends:
libvirt(Linux/KVM): IMPLEMENTED — creates qcow2 overlays (qemu-img), generates domain XML with virtio devices, builds/attaches NoCloud seed ISO (mkisofs/genisoimage), defines and starts the domain, shuts down via ACPI with timeout and forces destroy if needed. Ensures the libvirt network (defaultby default) is active and autostarted.zones(illumos/bhyve): scaffold (not yet functional); will integrate withzonecrate + ZFS clones in a follow-up.
NoopHypervisorfor development on hosts without privileges.
- Orchestrator MQ wiring
- Consumes
JobRequestmessages and buildsVmSpecfrom resolved label and image defaults. - Injects minimal cloud-init user-data content (NoCloud) into the spec for seeding.
- Consumes
- Graceful shutdown
- On SIGINT/SIGTERM the consumer is stopped, the scheduler is allowed to drain, and active VMs are asked to shutdown gracefully before being destroyed.
Configuration (CLI/env)
--config,ORCH_CONFIG— path to YAML image map (defaultexamples/orchestrator-image-map.yaml).--max-concurrency,MAX_CONCURRENCY— global VM concurrency (default 2).--capacity-map,CAPACITY_MAP— per-label capacity (e.g.,illumos-latest=2,ubuntu-22.04=4).- AMQP:
AMQP_URL,AMQP_EXCHANGE,AMQP_QUEUE,AMQP_ROUTING_KEY,AMQP_PREFETCH(defaulted toMAX_CONCURRENCY). - Libvirt (Linux):
LIBVIRT_URI(defaultqemu:///system),LIBVIRT_NETWORK(defaultdefault). - Requirements for libvirt lifecycle on Linux:
libvirtdrunning,qemu-img, andmkisofs(orgenisoimage) available on PATH.
Local usage (dev)
- Ensure RabbitMQ is running (docker-compose service
rabbitmq). - Start the Orchestrator:
On first run, the OI cloud image will be downloaded and decompressed to the configuredcargo run -p orchestrator -- \ --config examples/orchestrator-image-map.yaml \ --max-concurrency 2local_path. - In another terminal, enqueue a job (Forge Integration webhook or CLI
enqueue). On Linux with libvirt enabled, the orchestrator will resolveruns_on(or default label), prepare an overlay and seed ISO, definejob-<uuid>and start it.
What’s next (planned)
- Illumos zones backend:
- Integrate
oxidecomputer/zoneand ZFS clone workflow; set bhyve attributes (vcpus,ram,bootdisk), networking, and SMF.
- Integrate
- Lifecycle and runner coordination:
- gRPC Orchestrator↔Runner for logs/status, job completion handling, and cleanup.
- Persistence and recovery:
- Store job/VM state in Postgres; reconcile on restart.
- Tests and docs:
- Unit tests for config parsing, scheduler capacity accounting, and cloud-init seed creation; an opt-in libvirt smoke test; expand docs.