SHA256

mirror of https://codeberg.org/Toasterson/solstice-ci.git synced 2026-04-10 13:20:41 +00:00

Till Wegmueller d6c2c3662c Add architecture plans and decision records

Plans:
- 001: vm-manager migration (completed)
- 002: runner-only architecture (active)

Decision records (ADRs):
- 001: Runner-only architecture — retire webhooks + logs service
- 002: Direct QEMU over libvirt
- 003: Ephemeral SSH keys with opt-in debug access
- 004: User-mode (SLIRP) networking for VMs

2026-04-09 22:03:12 +02:00

2 KiB

Raw Blame History

ADR-003: Ephemeral SSH Keys with Opt-In Debug Access

Date: 2026-04-09 Status: Accepted Deciders: Till Wegmueller

Context

The orchestrator generates an Ed25519 SSH keypair per job for authenticating to the provisioned VM. Currently, both public and private keys are persisted to PostgreSQL in plaintext (job_ssh_keys table). This creates a security risk — a database breach exposes all SSH keys.

The keys are only needed during the VM's lifetime: from provisioning (cloud-init injects the public key) through SSH execution (orchestrator authenticates with the private key) to VM destruction.

Decision

Make SSH keys fully ephemeral:

Generate keypair in-memory
Inject public key via cloud-init
Use private key for SSH connection
Forget both keys when VM is destroyed
Never persist to database

Exception: opt-in debug SSH for failed builds. When a job fails and the user has opted in (e.g., via a workflow annotation or label), keep the VM alive for a TTL (30 minutes) and expose SSH connection info in the build log so the user can debug inside the target OS.

Consequences

Positive

Zero persistent key storage: no database breach risk for SSH keys
Simpler persistence layer: job_ssh_keys table can be removed
Debug SSH feature: valuable for OS-specific debugging (illumos quirks, package issues) — similar to CircleCI's "rerun with SSH" and Buildkite's debug feature

Negative

Cannot retroactively access a destroyed VM: if the key is gone, there's no way back in
Debug SSH adds complexity: TTL management, rate limiting, VM lifecycle exception path
Debug SSH is a security surface: must ensure TTL is enforced and connection info only goes to the authenticated user via the platform's log channel

Design for debug SSH

Rate limit: max 1 debug session per project concurrently
TTL: 30 minutes, non-renewable, force-destroy on expiry
Connection info: printed as a build log step (platform controls access)
Opt-in: explicit flag required (never default)

2 KiB Raw Blame History