reddwarf/crates/reddwarf-apiserver/src
Till Wegmueller 58171c7555
Add periodic reconciliation, node health checker, and graceful pod termination
Three high-priority reliability features that close gaps identified in AUDIT.md:

1. Periodic reconciliation: PodController now runs reconcile_all() every 30s
   via a tokio::time::interval branch in the select! loop, detecting zone
   crashes between events.

2. Node health checker: New NodeHealthChecker polls node heartbeats every 15s
   and marks nodes with stale heartbeats (>40s) as NotReady with reason
   NodeStatusUnknown, preserving last_transition_time correctly.

3. Graceful pod termination: DELETE sets deletion_timestamp and phase=Terminating
   instead of immediate removal. Controller drives a state machine (shutdown →
   halt on grace expiry → deprovision → finalize) with periodic reconcile
   advancing it. New POST .../finalize endpoint performs actual storage removal.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 20:39:36 +01:00
..
handlers Add periodic reconciliation, node health checker, and graceful pod termination 2026-02-14 20:39:36 +01:00
error.rs Implement phase 4 2026-01-28 23:06:06 +01:00
event_bus.rs Close the control loop: versioned bind, event-driven controller, graceful shutdown 2026-02-08 23:21:53 +01:00
lib.rs Add optional TLS support and SMF service integration 2026-02-14 18:45:20 +01:00
response.rs Implement phase 4 2026-01-28 23:06:06 +01:00
server.rs Add periodic reconciliation, node health checker, and graceful pod termination 2026-02-14 20:39:36 +01:00
state.rs Add event bus and reddwarf-runtime crate 2026-02-08 21:29:17 +01:00
tls.rs Add optional TLS support and SMF service integration 2026-02-14 18:45:20 +01:00
validation.rs Implement phase 4 2026-01-28 23:06:06 +01:00
watch.rs Close the control loop: versioned bind, event-driven controller, graceful shutdown 2026-02-08 23:21:53 +01:00