Replace hardcoded memory (8Gi) and pod limits (110) in the node agent with
actual system detection via the sys-info crate. CPU and memory are detected
once at NodeAgent construction and reused on every heartbeat. Capacity
reports raw hardware values while allocatable subtracts configurable
reservations (--system-reserved-cpu, --system-reserved-memory, --max-pods),
giving the scheduler accurate data for filtering and scoring.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Enable the API server to optionally serve HTTPS (disabled by default).
When --tls is passed without explicit cert/key paths, a self-signed CA
and server certificate are auto-generated via rcgen and persisted to
disk for reuse across restarts. The internal ApiClient learns to trust
the self-signed CA so controller/agent components work seamlessly over
TLS.
Also adds SIGTERM signal handling (alongside SIGINT) and graceful
shutdown via CancellationToken for both `serve` and `agent` modes,
plus an SMF manifest and method script so reddwarf can run as
svc:/system/reddwarf:default on illumos.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Move WatchEventType and ResourceEvent to reddwarf-core so scheduler
and runtime can use them without depending on the apiserver crate
- Fix scheduler bind_pod to create versioned commits and publish
MODIFIED events so the pod controller learns about scheduled pods
- Replace polling loop in pod controller with event bus subscription,
wire handle_delete for DELETED events, keep reconcile_all for
startup sync and lag recovery
- Add allocatable/capacity resources (cpu, memory, pods) to node agent
build_node so the scheduler's resource filter accepts nodes
- Bootstrap "default" namespace on startup to prevent pod creation
failures in the default namespace
- Replace .abort() shutdown with CancellationToken-based graceful
shutdown across scheduler, controller, and node agent
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implement the core reconciliation loop that connects Pod events to zone
lifecycle. Status subresource endpoints allow updating pod/node status
without triggering spec-level changes. The main binary now provides
`serve` (API server only) and `agent` (full node: API + scheduler +
controller + heartbeat) subcommands via clap.
- Status subresource: generic update_status in common.rs, PUT endpoints
for /pods/{name}/status and /nodes/{name}/status
- Pod controller: polls pods assigned to this node, provisions zones via
ZoneRuntime, updates status to Running/Failed, monitors zone health
- Node agent: registers host as a Node, sends periodic heartbeats with
Ready condition
- API client: lightweight reqwest-based HTTP client for controller and
node agent to talk to the API server
- Main binary: clap CLI with serve/agent commands, wires all components
together with graceful shutdown via ctrl-c
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implement an in-process broadcast event bus for resource mutations
(ADDED/MODIFIED/DELETED) with SSE watch endpoints on all list handlers,
following the Kubernetes watch protocol. Add the reddwarf-runtime crate
with a trait-based zone runtime abstraction targeting illumos zones,
including LX and custom reddwarf brand support, etherstub/direct VNIC
networking, ZFS dataset management, and a MockRuntime for testing on
non-illumos platforms.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>