# ADR-003: Ephemeral SSH Keys with Opt-In Debug Access **Date:** 2026-04-09 **Status:** Accepted **Deciders:** Till Wegmueller ## Context The orchestrator generates an Ed25519 SSH keypair per job for authenticating to the provisioned VM. Currently, both public and private keys are persisted to PostgreSQL in plaintext (`job_ssh_keys` table). This creates a security risk — a database breach exposes all SSH keys. The keys are only needed during the VM's lifetime: from provisioning (cloud-init injects the public key) through SSH execution (orchestrator authenticates with the private key) to VM destruction. ## Decision Make SSH keys fully ephemeral: 1. Generate keypair in-memory 2. Inject public key via cloud-init 3. Use private key for SSH connection 4. Forget both keys when VM is destroyed 5. Never persist to database Exception: **opt-in debug SSH for failed builds**. When a job fails and the user has opted in (e.g., via a workflow annotation or label), keep the VM alive for a TTL (30 minutes) and expose SSH connection info in the build log so the user can debug inside the target OS. ## Consequences ### Positive - **Zero persistent key storage**: no database breach risk for SSH keys - **Simpler persistence layer**: `job_ssh_keys` table can be removed - **Debug SSH feature**: valuable for OS-specific debugging (illumos quirks, package issues) — similar to CircleCI's "rerun with SSH" and Buildkite's debug feature ### Negative - **Cannot retroactively access a destroyed VM**: if the key is gone, there's no way back in - **Debug SSH adds complexity**: TTL management, rate limiting, VM lifecycle exception path - **Debug SSH is a security surface**: must ensure TTL is enforced and connection info only goes to the authenticated user via the platform's log channel ### Design for debug SSH - Rate limit: max 1 debug session per project concurrently - TTL: 30 minutes, non-renewable, force-destroy on expiry - Connection info: printed as a build log step (platform controls access) - Opt-in: explicit flag required (never default)