mirror of
https://codeberg.org/Toasterson/solstice-ci.git
synced 2026-04-10 13:20:41 +00:00
83 lines
4.9 KiB
Markdown
83 lines
4.9 KiB
Markdown
### Solstice CI — Architecture Overview (KDL Jobs + Multi‑Host Orchestrator)
|
||
|
||
This document updates the earlier blueprint to reflect the current direction of Solstice CI:
|
||
- The project name is Solstice CI (not Helios CI).
|
||
- Workflows are defined in KDL (KDL:0) instead of YAML.
|
||
- The Orchestrator is designed to run on multiple hosts behind a shared queue for horizontal scale.
|
||
- A small set of crates provides clean separation of concerns: `orchestrator`, `forge-integration`, `github-integration`, `workflow-runner`, `common`, `ciadm`, and `cidev`.
|
||
|
||
#### Core Components
|
||
- Forge Integration Layer (`crates/forge-integration` and `crates/github-integration`)
|
||
- Receives webhooks from Forgejo or GitHub.
|
||
- Normalizes events and publishes job requests to the Orchestrator (direct API or message queue; see multi‑host section).
|
||
- Reports status back to the forge (Checks API for GitHub; Commit Status API for Forgejo).
|
||
- Orchestrator (`crates/orchestrator`)
|
||
- Provisions ephemeral VMs via bhyve branded zones on illumos hosts and manages their lifecycle using ZFS clones.
|
||
- Streams logs and results between the VM resident runner and the Integration Layer.
|
||
- Multi‑host aware: multiple Orchestrator instances can run on different illumos hosts and share work (see below).
|
||
- Workflow Runner (`crates/workflow-runner`)
|
||
- Minimal agent binary pre‑installed in the base VM image.
|
||
- Fetches job definition from the Orchestrator, executes steps, streams logs, and returns final status.
|
||
- Common (`crates/common`)
|
||
- DRY utilities used by all binaries: tracing/log initialization, KDL job parsing, and future shared abstractions.
|
||
- Admin CLI (`crates/ciadm`)
|
||
- Operator utility to trigger jobs, check status, etc., against the Orchestrator.
|
||
- Dev CLI (`crates/cidev`)
|
||
- Developer utility to validate KDL files locally, inspect jobs and steps, and debug CI issues without needing the full system.
|
||
|
||
#### Multi‑Host Orchestration
|
||
To support multiple hosts, Solstice CI uses a shared queue (e.g., RabbitMQ) between the Integration Layer and Orchestrators:
|
||
- The Integration Layer publishes job requests into a durable queue.
|
||
- Any healthy Orchestrator node can consume a job, subject to capacity constraints.
|
||
- Nodes coordinate through the queue and an internal state store (e.g., Postgres) for job status.
|
||
- Each node manages ZFS clones and bhyve zones locally; failure isolation is per‑node.
|
||
- This model scales linearly by adding illumos hosts with Orchestrator instances.
|
||
|
||
#### KDL Workflow Definition
|
||
Solstice CI adopts a simple, explicit KDL schema for workflows. Example:
|
||
|
||
```
|
||
workflow name="Solstice CI" {
|
||
job id="build" runs_on="illumos-stable" {
|
||
step name="Format" run="cargo fmt --check"
|
||
step name="Clippy" run="cargo clippy -- -D warnings"
|
||
step name="Test" run="cargo test --workspace"
|
||
}
|
||
|
||
job id="lint" runs_on="ubuntu-22.04" {
|
||
step name="Lint" run="ruff check ."
|
||
}
|
||
}
|
||
```
|
||
|
||
Key points:
|
||
- `workflow` is the root node; `name` is optional.
|
||
- One or more `job` nodes define independent VMs. Each job can have a `runs_on` hint to select a base image.
|
||
- Each `job` contains one or more `step` nodes with a `run` command and optional `name`.
|
||
|
||
The current parser lives in `crates/common/src/job.rs` and performs strict, typed parsing using the `kdl` crate.
|
||
|
||
#### Execution Flow (High‑Level)
|
||
1. A forge sends a webhook to the Integration Layer.
|
||
2. Integration validates/authenticates and publishes a job request to the queue (or calls the Orchestrator API in single‑node setups).
|
||
3. An Orchestrator node accepts the job, creates a ZFS clone of a golden VM image, builds a bhyve zone config, and boots the VM.
|
||
4. The Runner starts in the VM, obtains the job definition (including parsed KDL steps), then executes each step, streaming logs back.
|
||
5. On completion or failure, the Orchestrator halts the zone and destroys the ZFS clone, then finalizes status via the Integration Layer.
|
||
|
||
#### Security & Observability Notes
|
||
- Secrets should be injected via a secrets backend (e.g., Vault) and masked in logs.
|
||
- Tracing/logs are initialized consistently via `crates/common` and can be wired to OTLP later.
|
||
- Network isolation defaults to an isolated VNIC and restricted egress.
|
||
|
||
#### Current Repository Skeleton
|
||
- Tracing/log initialization is provided by `common::init_tracing` (console only for now).
|
||
- KDL job parsing types: `Workflow`, `Job`, `Step` and helpers in `crates/common/src/job.rs`.
|
||
- Binaries provide Clap‑based CLIs with environment variable support.
|
||
- `cidev` validates and inspects KDL locally; `ciadm` is oriented to operator interactions with the Orchestrator.
|
||
|
||
#### Next Steps
|
||
- Wire the Integration Layer to a real message queue and define the internal job request schema.
|
||
- Implement Orchestrator capacity management and host selection.
|
||
- Add gRPC service definitions for Orchestrator <-> Runner streaming logs and control.
|
||
- Add GitHub App authentication (octocrab) and Forgejo (Gitea) client for status updates.
|
||
- Implement secure secrets injection and masking.
|