reddwarf/crates/reddwarf-runtime/src/lib.rs
Till Wegmueller d79f8ce011
Add health probes (liveness/readiness/startup) with exec, HTTP, and TCP checks
Implement Kubernetes-style health probes that run during the reconcile loop
to detect unhealthy applications inside running zones. Previously the pod
controller only checked zone liveness via get_zone_state(), missing cases
where the zone is running but the application inside has crashed.

- Add exec_in_zone() to ZoneRuntime trait, implemented via zlogin on illumos
  and with configurable mock results for testing
- Add probe type system (ProbeKind, ProbeAction, ContainerProbeConfig) that
  decouples from k8s_openapi and extracts probes from pod container specs
  with proper k8s defaults (period=10s, timeout=1s, failure=3, success=1)
- Add ProbeExecutor for exec/HTTP/TCP checks with tokio timeout support
  (HTTPS falls back to TCP-only with warning)
- Add ProbeTracker state machine that tracks per-pod/container/probe-kind
  state, respects initial delays and periods, gates liveness on startup
  probes, and aggregates results into PodProbeStatus
- Integrate into PodController reconcile loop: on liveness failure set
  phase=Failed with reason LivenessProbeFailure; on readiness failure set
  Ready=False; on all-pass restore Ready=True
- Add ProbeFailed error variant with miette diagnostic

Known v1 limitation: probes execute at reconcile cadence (~30s), not at
their configured periodSeconds.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 22:41:30 +01:00

46 lines
1.3 KiB
Rust

// Allow unused assignments for diagnostic fields - they're used by the thiserror/miette macros
#![allow(unused_assignments)]
pub mod api_client;
pub mod brand;
pub mod command;
pub mod controller;
pub mod error;
#[cfg(target_os = "illumos")]
pub mod illumos;
pub mod mock;
pub mod network;
pub mod node_agent;
pub mod probes;
pub mod node_health;
pub mod storage;
pub mod sysinfo;
pub mod traits;
pub mod types;
pub mod zone;
// Re-export primary types
pub use error::{Result, RuntimeError};
pub use mock::MockRuntime;
pub use network::{CidrConfig, IpAllocation, Ipam};
pub use traits::ZoneRuntime;
pub use types::{
ContainerProcess, DirectNicConfig, EtherstubConfig, FsMount, NetworkMode, StoragePoolConfig,
ZoneBrand, ZoneConfig, ZoneInfo, ZoneState, ZoneStorageOpts,
};
// Re-export storage types
#[cfg(target_os = "illumos")]
pub use storage::ZfsStorageEngine;
pub use storage::{MockStorageEngine, StorageEngine, VolumeInfo};
// Re-export controller and agent types
pub use api_client::ApiClient;
pub use controller::{PodController, PodControllerConfig};
pub use node_agent::{NodeAgent, NodeAgentConfig};
pub use node_health::{NodeHealthChecker, NodeHealthCheckerConfig};
pub use probes::{ProbeExecutor, ProbeTracker};
// Conditionally re-export illumos runtime
#[cfg(target_os = "illumos")]
pub use illumos::IllumosRuntime;