mirror of
https://codeberg.org/Toasterson/solstice-ci.git
synced 2026-04-10 13:20:41 +00:00
Plans: - 001: vm-manager migration (completed) - 002: runner-only architecture (active) Decision records (ADRs): - 001: Runner-only architecture — retire webhooks + logs service - 002: Direct QEMU over libvirt - 003: Ephemeral SSH keys with opt-in debug access - 004: User-mode (SLIRP) networking for VMs
40 lines
2 KiB
Markdown
40 lines
2 KiB
Markdown
# ADR-003: Ephemeral SSH Keys with Opt-In Debug Access
|
|
|
|
**Date:** 2026-04-09
|
|
**Status:** Accepted
|
|
**Deciders:** Till Wegmueller
|
|
|
|
## Context
|
|
|
|
The orchestrator generates an Ed25519 SSH keypair per job for authenticating to the provisioned VM. Currently, both public and private keys are persisted to PostgreSQL in plaintext (`job_ssh_keys` table). This creates a security risk — a database breach exposes all SSH keys.
|
|
|
|
The keys are only needed during the VM's lifetime: from provisioning (cloud-init injects the public key) through SSH execution (orchestrator authenticates with the private key) to VM destruction.
|
|
|
|
## Decision
|
|
|
|
Make SSH keys fully ephemeral:
|
|
1. Generate keypair in-memory
|
|
2. Inject public key via cloud-init
|
|
3. Use private key for SSH connection
|
|
4. Forget both keys when VM is destroyed
|
|
5. Never persist to database
|
|
|
|
Exception: **opt-in debug SSH for failed builds**. When a job fails and the user has opted in (e.g., via a workflow annotation or label), keep the VM alive for a TTL (30 minutes) and expose SSH connection info in the build log so the user can debug inside the target OS.
|
|
|
|
## Consequences
|
|
|
|
### Positive
|
|
- **Zero persistent key storage**: no database breach risk for SSH keys
|
|
- **Simpler persistence layer**: `job_ssh_keys` table can be removed
|
|
- **Debug SSH feature**: valuable for OS-specific debugging (illumos quirks, package issues) — similar to CircleCI's "rerun with SSH" and Buildkite's debug feature
|
|
|
|
### Negative
|
|
- **Cannot retroactively access a destroyed VM**: if the key is gone, there's no way back in
|
|
- **Debug SSH adds complexity**: TTL management, rate limiting, VM lifecycle exception path
|
|
- **Debug SSH is a security surface**: must ensure TTL is enforced and connection info only goes to the authenticated user via the platform's log channel
|
|
|
|
### Design for debug SSH
|
|
- Rate limit: max 1 debug session per project concurrently
|
|
- TTL: 30 minutes, non-renewable, force-destroy on expiry
|
|
- Connection info: printed as a build log step (platform controls access)
|
|
- Opt-in: explicit flag required (never default)
|