mirror of
https://github.com/CloudNebulaProject/reddwarf.git
synced 2026-04-10 13:20:40 +00:00
Three high-priority reliability features that close gaps identified in AUDIT.md: 1. Periodic reconciliation: PodController now runs reconcile_all() every 30s via a tokio::time::interval branch in the select! loop, detecting zone crashes between events. 2. Node health checker: New NodeHealthChecker polls node heartbeats every 15s and marks nodes with stale heartbeats (>40s) as NotReady with reason NodeStatusUnknown, preserving last_transition_time correctly. 3. Graceful pod termination: DELETE sets deletion_timestamp and phase=Terminating instead of immediate removal. Controller drives a state machine (shutdown → halt on grace expiry → deprovision → finalize) with periodic reconcile advancing it. New POST .../finalize endpoint performs actual storage removal. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| reddwarf | ||
| reddwarf-apiserver | ||
| reddwarf-core | ||
| reddwarf-runtime | ||
| reddwarf-scheduler | ||
| reddwarf-storage | ||
| reddwarf-versioning | ||