Commit graph

6 commits

Author SHA1 Message Date
Claude
d8425ad85d
Add service networking, bhyve brand, ipadm IP config, and zone state reporting
Service networking:
- ClusterIP IPAM allocation on service create/delete via reusable Ipam with_prefix()
- ServiceController watches Pod/Service events + periodic reconcile to track endpoints
- NatManager generates ipnat rdr rules for ClusterIP -> pod IP forwarding
- Embedded DNS server resolves {svc}.{ns}.svc.cluster.local to ClusterIP
- New CLI flags: --service-cidr (default 10.96.0.0/12), --cluster-dns (default 0.0.0.0:10053)

Quick wins:
- ipadm IP assignment: configure_zone_ip() runs ipadm/route inside zone via zlogin after boot
- Node heartbeat zone state reporting: reddwarf.io/zone-count and zone-summary annotations
- bhyve brand support: ZoneBrand::Bhyve, install args, zonecfg device generation, controller integration

189 tests passing, clippy clean.

https://claude.ai/code/session_016QLFjAyYGzMPbBjEGMe75j
2026-03-19 20:28:40 +00:00
Till Wegmueller
58171c7555
Add periodic reconciliation, node health checker, and graceful pod termination
Three high-priority reliability features that close gaps identified in AUDIT.md:

1. Periodic reconciliation: PodController now runs reconcile_all() every 30s
   via a tokio::time::interval branch in the select! loop, detecting zone
   crashes between events.

2. Node health checker: New NodeHealthChecker polls node heartbeats every 15s
   and marks nodes with stale heartbeats (>40s) as NotReady with reason
   NodeStatusUnknown, preserving last_transition_time correctly.

3. Graceful pod termination: DELETE sets deletion_timestamp and phase=Terminating
   instead of immediate removal. Controller drives a state machine (shutdown →
   halt on grace expiry → deprovision → finalize) with periodic reconcile
   advancing it. New POST .../finalize endpoint performs actual storage removal.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 20:39:36 +01:00
Till Wegmueller
cb6ca8cd3c
Add optional TLS support and SMF service integration
Enable the API server to optionally serve HTTPS (disabled by default).
When --tls is passed without explicit cert/key paths, a self-signed CA
and server certificate are auto-generated via rcgen and persisted to
disk for reuse across restarts. The internal ApiClient learns to trust
the self-signed CA so controller/agent components work seamlessly over
TLS.

Also adds SIGTERM signal handling (alongside SIGINT) and graceful
shutdown via CancellationToken for both `serve` and `agent` modes,
plus an SMF manifest and method script so reddwarf can run as
svc:/system/reddwarf:default on illumos.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 18:45:20 +01:00
Till Wegmueller
a47784797b
Add event bus and reddwarf-runtime crate
Implement an in-process broadcast event bus for resource mutations
(ADDED/MODIFIED/DELETED) with SSE watch endpoints on all list handlers,
following the Kubernetes watch protocol. Add the reddwarf-runtime crate
with a trait-based zone runtime abstraction targeting illumos zones,
including LX and custom reddwarf brand support, etherstub/direct VNIC
networking, ZFS dataset management, and a MockRuntime for testing on
non-illumos platforms.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 21:29:17 +01:00
Till Wegmueller
149321f092
Implement phase 4
Signed-off-by: Till Wegmueller <toasterson@gmail.com>
2026-01-28 23:06:06 +01:00
Till Wegmueller
3a03400c1f
Implement first 3 phases of implementation plan
Signed-off-by: Till Wegmueller <toasterson@gmail.com>
2026-01-28 22:51:26 +01:00