mirror of
https://codeberg.org/Toasterson/solstice-ci.git
synced 2026-04-10 13:20:41 +00:00
4.9 KiB
4.9 KiB
Solstice CI — Architecture Overview (KDL Jobs + Multi‑Host Orchestrator)
This document updates the earlier blueprint to reflect the current direction of Solstice CI:
- The project name is Solstice CI (not Helios CI).
- Workflows are defined in KDL (KDL:0) instead of YAML.
- The Orchestrator is designed to run on multiple hosts behind a shared queue for horizontal scale.
- A small set of crates provides clean separation of concerns:
orchestrator,forge-integration,github-integration,workflow-runner,common,ciadm, andcidev.
Core Components
- Forge Integration Layer (
crates/forge-integrationandcrates/github-integration)- Receives webhooks from Forgejo or GitHub.
- Normalizes events and publishes job requests to the Orchestrator (direct API or message queue; see multi‑host section).
- Reports status back to the forge (Checks API for GitHub; Commit Status API for Forgejo).
- Orchestrator (
crates/orchestrator)- Provisions ephemeral VMs via bhyve branded zones on illumos hosts and manages their lifecycle using ZFS clones.
- Streams logs and results between the VM resident runner and the Integration Layer.
- Multi‑host aware: multiple Orchestrator instances can run on different illumos hosts and share work (see below).
- Workflow Runner (
crates/workflow-runner)- Minimal agent binary pre‑installed in the base VM image.
- Fetches job definition from the Orchestrator, executes steps, streams logs, and returns final status.
- Common (
crates/common)- DRY utilities used by all binaries: tracing/log initialization, KDL job parsing, and future shared abstractions.
- Admin CLI (
crates/ciadm)- Operator utility to trigger jobs, check status, etc., against the Orchestrator.
- Dev CLI (
crates/cidev)- Developer utility to validate KDL files locally, inspect jobs and steps, and debug CI issues without needing the full system.
Multi‑Host Orchestration
To support multiple hosts, Solstice CI uses a shared queue (e.g., RabbitMQ) between the Integration Layer and Orchestrators:
- The Integration Layer publishes job requests into a durable queue.
- Any healthy Orchestrator node can consume a job, subject to capacity constraints.
- Nodes coordinate through the queue and an internal state store (e.g., Postgres) for job status.
- Each node manages ZFS clones and bhyve zones locally; failure isolation is per‑node.
- This model scales linearly by adding illumos hosts with Orchestrator instances.
KDL Workflow Definition
Solstice CI adopts a simple, explicit KDL schema for workflows. Example:
workflow name="Solstice CI" {
job id="build" runs_on="illumos-stable" {
step name="Format" run="cargo fmt --check"
step name="Clippy" run="cargo clippy -- -D warnings"
step name="Test" run="cargo test --workspace"
}
job id="lint" runs_on="ubuntu-22.04" {
step name="Lint" run="ruff check ."
}
}
Key points:
workflowis the root node;nameis optional.- One or more
jobnodes define independent VMs. Each job can have aruns_onhint to select a base image. - Each
jobcontains one or morestepnodes with aruncommand and optionalname.
The current parser lives in crates/common/src/job.rs and performs strict, typed parsing using the kdl crate.
Execution Flow (High‑Level)
- A forge sends a webhook to the Integration Layer.
- Integration validates/authenticates and publishes a job request to the queue (or calls the Orchestrator API in single‑node setups).
- An Orchestrator node accepts the job, creates a ZFS clone of a golden VM image, builds a bhyve zone config, and boots the VM.
- The Runner starts in the VM, obtains the job definition (including parsed KDL steps), then executes each step, streaming logs back.
- On completion or failure, the Orchestrator halts the zone and destroys the ZFS clone, then finalizes status via the Integration Layer.
Security & Observability Notes
- Secrets should be injected via a secrets backend (e.g., Vault) and masked in logs.
- Tracing/logs are initialized consistently via
crates/commonand can be wired to OTLP later. - Network isolation defaults to an isolated VNIC and restricted egress.
Current Repository Skeleton
- Tracing/log initialization is provided by
common::init_tracing(console only for now). - KDL job parsing types:
Workflow,Job,Stepand helpers incrates/common/src/job.rs. - Binaries provide Clap‑based CLIs with environment variable support.
cidevvalidates and inspects KDL locally;ciadmis oriented to operator interactions with the Orchestrator.
Next Steps
- Wire the Integration Layer to a real message queue and define the internal job request schema.
- Implement Orchestrator capacity management and host selection.
- Add gRPC service definitions for Orchestrator <-> Runner streaming logs and control.
- Add GitHub App authentication (octocrab) and Forgejo (Gitea) client for status updates.
- Implement secure secrets injection and masking.