mirror of
https://codeberg.org/Toasterson/solstice-ci.git
synced 2026-04-10 13:20:41 +00:00
124 lines
6.2 KiB
Markdown
124 lines
6.2 KiB
Markdown
|
|
# Plan: Runner-Only Architecture
|
||
|
|
|
||
|
|
**Status:** Active
|
||
|
|
**Created:** 2026-04-09
|
||
|
|
**Planner ID:** `5ea54391-2b17-4790-9f6a-27afcc410fa6`
|
||
|
|
|
||
|
|
## Summary
|
||
|
|
|
||
|
|
Simplify Solstice CI from 7+ services to 3 by acting exclusively as a native runner for GitHub and Forgejo. All logs, artifacts, and status flow through the platform's native UI. Our unique value is VM orchestration for non-Linux OSes (illumos, omnios, OpenIndiana).
|
||
|
|
|
||
|
|
## Motivation
|
||
|
|
|
||
|
|
The current architecture has Solstice CI reimplementing functionality that GitHub and Forgejo already provide:
|
||
|
|
- **Webhook ingestion** — both platforms have runner protocols that push jobs to runners
|
||
|
|
- **Log storage and viewing** — both platforms display logs in their own UI
|
||
|
|
- **Artifact storage** — both platforms have artifact APIs
|
||
|
|
- **Status reporting** — both platforms show build status natively
|
||
|
|
|
||
|
|
We are building and maintaining 4 extra services (forge-integration, github-integration, logs-service, custom dashboards) that provide a worse user experience than the native platform UI.
|
||
|
|
|
||
|
|
## Architecture Change
|
||
|
|
|
||
|
|
### Before (7+ services)
|
||
|
|
```
|
||
|
|
Forgejo webhooks --> forge-integration --> RabbitMQ --> orchestrator --> VMs
|
||
|
|
GitHub webhooks --> github-integration --> RabbitMQ /
|
||
|
|
logs-service <-- orchestrator
|
||
|
|
runner-integration --> Forgejo
|
||
|
|
```
|
||
|
|
|
||
|
|
### After (3 services)
|
||
|
|
```
|
||
|
|
Forgejo <--> forgejo-runner <--> RabbitMQ <--> orchestrator <--> VMs
|
||
|
|
GitHub <--> github-runner <--> RabbitMQ /
|
||
|
|
```
|
||
|
|
|
||
|
|
### Services retained
|
||
|
|
| Service | Role |
|
||
|
|
|---------|------|
|
||
|
|
| **forgejo-runner** (runner-integration) | Sole Forgejo interface via connect-rpc |
|
||
|
|
| **github-runner** (NEW) | Sole GitHub interface via Actions runner protocol |
|
||
|
|
| **orchestrator** | VM provisioning via vm-manager/QEMU |
|
||
|
|
|
||
|
|
### Services retired
|
||
|
|
| Service | Replacement |
|
||
|
|
|---------|-------------|
|
||
|
|
| forge-integration | forgejo-runner (runner protocol replaces webhooks) |
|
||
|
|
| github-integration | github-runner (runner protocol replaces GitHub App) |
|
||
|
|
| logs-service | Platform UI (logs sent via runner protocol) |
|
||
|
|
|
||
|
|
### Infrastructure retained
|
||
|
|
- **RabbitMQ** — job buffer between runners and orchestrator
|
||
|
|
- **PostgreSQL** — job state persistence in orchestrator
|
||
|
|
- **vm-manager** — QEMU VM lifecycle management
|
||
|
|
|
||
|
|
## Tasks
|
||
|
|
|
||
|
|
| # | Task | Priority | Effort | Depends on | Status |
|
||
|
|
|---|------|----------|--------|------------|--------|
|
||
|
|
| 1 | Evolve workflow-runner to execute Actions YAML run steps | 100 | M | — | pending |
|
||
|
|
| 2 | Orchestrator: accept step commands via JobRequest | 95 | M | — | pending |
|
||
|
|
| 3 | Clean up Forgejo runner as sole interface | 90 | L | 1 | pending |
|
||
|
|
| 4 | Implement GitHub Actions runner integration | 80 | XL | 1 | pending |
|
||
|
|
| 5 | Security: ephemeral SSH keys + opt-in debug SSH | 60 | M | 7 | pending |
|
||
|
|
| 6 | Documentation: image catalog + illumos guides | 50 | L | — | pending |
|
||
|
|
| 7 | Retire forge-integration, github-integration, logs-service | 40 | M | 3, 4 | pending |
|
||
|
|
|
||
|
|
### Task details
|
||
|
|
|
||
|
|
#### 1. Evolve workflow-runner to execute Actions YAML run steps
|
||
|
|
The workflow-runner currently parses `.solstice/workflow.kdl`. It also needs to execute standard GitHub Actions YAML `run` steps passed via `job.yaml`. The runner integrations translate Actions YAML into step commands before publishing to MQ. KDL support is kept as a superset for users who want setup scripts and multi-OS abstractions.
|
||
|
|
|
||
|
|
#### 2. Orchestrator: accept step commands via JobRequest
|
||
|
|
Add `steps: Option<Vec<StepCommand>>` to `JobRequest` (common/src/messages.rs). Each `StepCommand` has `name`, `run`, and optional `env`. The orchestrator writes these to `job.yaml` so the workflow-runner can execute them directly. If `steps` is `None`, workflow-runner falls back to `.solstice/workflow.kdl`.
|
||
|
|
|
||
|
|
#### 3. Clean up Forgejo runner as sole interface
|
||
|
|
Remove the tier-1 KDL workflow fetch from `translator.rs`. Actions YAML `run` steps become the primary translation path. Handle matrix builds by expanding into separate `JobRequest`s. Report unsupported `uses:` steps with clear errors. Remove dependency on `FORGEJO_BASE_URL`/`FORGEJO_TOKEN` for fetching workflow files.
|
||
|
|
|
||
|
|
#### 4. Implement GitHub Actions runner integration
|
||
|
|
New crate implementing the GitHub Actions self-hosted runner protocol (REST + JSON). Significantly more complex than Forgejo's connect-rpc: RSA JWT authentication, OAuth bearer tokens, 50-second long-poll, per-job heartbeat every 60s, encrypted job delivery. Same internal pattern as runner-integration (poller + reporter + state).
|
||
|
|
|
||
|
|
#### 5. Security: ephemeral SSH keys + opt-in debug SSH
|
||
|
|
Stop persisting SSH keys to the database. Generate in-memory, inject via cloud-init, forget after VM destroy. For failed builds with opt-in debug flag: keep VM alive for 30 minutes, expose SSH connection info in build log, rate-limit to 1 debug session per project.
|
||
|
|
|
||
|
|
#### 6. Documentation
|
||
|
|
User-facing docs for FLOSS projects: getting started guide, image catalog (`runs-on` labels), illumos/omnios-specific guide (pkg, tar, CA certs), FAQ (supported features, limitations).
|
||
|
|
|
||
|
|
#### 7. Retire old services
|
||
|
|
Remove forge-integration, github-integration, and logs-service from compose.yml. Clean up environment variables, Traefik routes, and database tables. Keep source code for reference but mark deprecated.
|
||
|
|
|
||
|
|
## Workflow format after migration
|
||
|
|
|
||
|
|
Users write **standard GitHub Actions YAML**. No custom format needed:
|
||
|
|
|
||
|
|
```yaml
|
||
|
|
name: CI
|
||
|
|
on: [push, pull_request]
|
||
|
|
jobs:
|
||
|
|
build:
|
||
|
|
runs-on: omnios-bloody # Solstice CI label
|
||
|
|
steps:
|
||
|
|
- run: pkg install developer/gcc13
|
||
|
|
- run: cargo build --release
|
||
|
|
- run: cargo test
|
||
|
|
```
|
||
|
|
|
||
|
|
Our documentation only needs to cover:
|
||
|
|
- Available `runs-on` labels (our OS images)
|
||
|
|
- What's pre-installed in each image
|
||
|
|
- OS-specific tips (illumos package managers, tar variants, etc.)
|
||
|
|
|
||
|
|
## Security model after migration
|
||
|
|
|
||
|
|
Most security concerns are **solved by delegation to the platform**:
|
||
|
|
|
||
|
|
| Concern | Solution |
|
||
|
|
|---------|----------|
|
||
|
|
| Log access control | Platform handles it (GitHub/Forgejo UI) |
|
||
|
|
| Webhook secrets | Platform handles per-repo secrets |
|
||
|
|
| Artifact storage | Platform handles it |
|
||
|
|
| User authentication | Platform handles it |
|
||
|
|
| SSH key storage | Ephemeral — destroyed with VM |
|
||
|
|
| Compute abuse | Per-runner concurrency limits + platform rate limiting |
|