Commit graph

52 commits

Author SHA256 Message Date
Till Wegmueller
471fa7f3e1 Enable pure-iso feature for vm-manager (no genisoimage needed) 2026-04-07 21:01:12 +02:00
Till Wegmueller
b5ccd4e2aa Pass work dir to vm-manager for container volume compatibility
Configure vm-manager's QEMU backend to use /var/lib/solstice-ci as the
data directory (matching the compose.yml volume mount) instead of the
default ~/.local/share/vmctl/vms/ path.
2026-04-07 17:45:00 +02:00
Till Wegmueller
b5c7078adc Switch vm-manager to git dep + multi-stage Containerfile
- Use HTTPS git dep for vm-manager (works in CI and container builds)
- Add .cargo/ to .gitignore (local dev patch override)
- Restore multi-stage Containerfile: Rust build stage fetches vm-manager
  from GitHub, Ubuntu 24.04 runtime with QEMU
- Host orchestrator stopped and disabled (container-only from now on)
2026-04-07 17:24:17 +02:00
Till Wegmueller
c9fc05a00e Remove libvirt dependencies and clean up orchestrator
- Remove `virt` crate dependency and libvirt feature flag
- Remove `ssh2` crate dependency (vm-manager handles SSH)
- Remove `zstd` crate dependency (vm-manager handles decompression)
- Remove LibvirtHypervisor, ZonesHypervisor, RouterHypervisor from hypervisor.rs
- Remove libvirt error types from error.rs
- Remove libvirt_uri/libvirt_network CLI options, add network_bridge
- Replace RouterHypervisor::build() with VmManagerAdapter::build()
- Update deb package depends: libvirt → qemu-system-x86
- Keep Noop backend for development/testing
- Dead old SSH/console functions left for future cleanup
2026-04-07 15:56:10 +02:00
Till Wegmueller
2d971ef500 Replace image download with vm-manager ImageManager
Use vm-manager's ImageManager::download() for streaming image downloads
with automatic zstd decompression, replacing the hand-rolled reqwest +
zstd code. Supports http(s), file://, and OCI artifact URLs.
2026-04-07 15:52:02 +02:00
Till Wegmueller
190eb5532f Replace scheduler SSH/console code with vm-manager APIs
- IP discovery: use hv.guest_ip() with timeout loop instead of
  discover_guest_ip_virsh() (500+ lines removed from hot path)
- SSH: use vm_manager::ssh::connect_with_retry() + upload() + exec()
  instead of hand-rolled TCP/ssh2/SFTP code
- Console: use vm_manager::console::ConsoleTailer over Unix socket
  instead of file-based tail_console_to_joblog()
- Add guest_ip() to orchestrator Hypervisor trait with default impl
- Remove #[cfg(linux, libvirt)] gates from is_illumos_label, expand_tilde
- Keep orchestrator-specific: DB persistence, log recording, MQ publish,
  runner binary selection, env var injection
2026-04-07 15:50:54 +02:00
Till Wegmueller
a60053f030 Add vm-manager adapter layer to orchestrator
- Add vm-manager as dependency of orchestrator
- Create vm_adapter.rs that bridges orchestrator's Hypervisor trait
  to vm-manager's RouterHypervisor (QEMU/Propolis/Noop backends)
- Add Qemu and Propolis variants to BackendTag
- Add console_socket, ssh_host_port, mac_addr fields to VmHandle
- Adapter uses user-mode networking by default for containerization
- Maps orchestrator VmSpec + JobContext → vm-manager VmSpec with
  CloudInitConfig and SshConfig
2026-04-07 15:46:20 +02:00
Till Wegmueller
633f658639
chore: format code with cargo fmt
Signed-off-by: Till Wegmueller <toasterson@gmail.com>
2025-11-18 15:43:18 +01:00
Till Wegmueller
8f909c0105
Update default SSH user to 'sol' and enhance cloud-init config; bump version to 0.1.15
- Change the default SSH username from 'ubuntu' to 'sol' for consistency with Solstice CI environment.
- Modify cloud-init user configuration to align with the new default, adding enhanced permissions and settings for 'sol' user.
- Increment orchestrator version to 0.1.15.

Signed-off-by: Till Wegmueller <toasterson@gmail.com>
2025-11-18 14:48:20 +01:00
Till Wegmueller
2c73c80619
Introduce workflow.jobs support and script path overrides; bump version to 0.1.14
- Add parsing and execution support for `.solstice/workflow.kdl` with job-specific configurations, including `runs_on`, `script path`, and `workflow_job_id`.
- Enable job grouping via `group_id` for cohesive workflow processing.
- Update orchestrator to pass workflow-specific parameters to `cloud-init` for finer control over execution.
- Refactor enqueue logic to handle multiple jobs per workflow with fallback to single job when no workflow is defined.
- Enhance dependencies for workflow parsing by integrating `base64`, `regex`, and `uuid`.
- Increment orchestrator version to 0.1.14 for release.

Signed-off-by: Till Wegmueller <toasterson@gmail.com>
2025-11-18 14:37:35 +01:00
Till Wegmueller
7fc4e8edb7
Introduce logs-service for structured job logs management; bump version to 0.1.13
- Add `logs-service` crate as a separate microservice to handle job log storage, retrieval, and categorization.
- Update orchestrator to redirect log endpoints to the new service with optional permanent redirects using `LOGS_BASE_URL`.
- Enhance log persistence by introducing structured fields such as category, level, and error flags.
- Implement migration to add new columns and indexes for job logs.
- Add ANSI escape sequence stripping and structured logging for cleaner log storage.
- Improve SSH log handling with interleaved stdout/stderr processing and pty request support.
- Revise Docker files and compose setup to include logs-service, with support for PostgreSQL and secure connections.

Signed-off-by: Till Wegmueller <toasterson@gmail.com>
2025-11-18 11:48:09 +01:00
Till Wegmueller
20a0efd116
Atomically upload runner via SFTP to ensure safe file replacement; bump version to 0.1.11
- Refactor runner upload logic to use temporary files and atomic renaming for safer updates.
- Improve file permission handling during temporary file creation.
- Increment orchestrator version to 0.1.11.

Signed-off-by: Till Wegmueller <toasterson@gmail.com>
2025-11-17 23:18:55 +01:00
Till Wegmueller
b36e5c70a8
Validate runner paths at startup and improve diagnostics; bump version to 0.1.10
- Add validation for `RUNNER_LINUX_PATH` and `RUNNER_ILLUMOS_PATH` with detailed warnings and diagnostics for misconfigurations.
- Log fallback to default paths and warn if binaries are missing.
- Increment orchestrator version to 0.1.10.

Signed-off-by: Till Wegmueller <toasterson@gmail.com>
2025-11-17 22:48:33 +01:00
Till Wegmueller
931e5ac81a
Add explicit libvirt configuration support; remove environment variable reliance; bump version to 0.1.9
- Introduce `libvirt_uri` and `libvirt_network` in configuration structs, replacing reliance on environment variables.
- Update all `virsh`-related logic to use explicit parameters for libvirt connection and network settings.
- Align codebase with new guidelines rejecting runtime environment variable mutations.
- Document breaking changes in `.junie/guidelines.md`.
- Increment orchestrator version to 0.1.9.

Signed-off-by: Till Wegmueller <toasterson@gmail.com>
2025-11-17 22:40:50 +01:00
Till Wegmueller
f1d161655f
Refactor dnsmasq leases-based guest IP discovery and bump version to 0.1.8
- Update IP selection logic to prefer the latest lease based on epoch timestamp.
- Remove redundant IP discovery logic in `net-dhcp-leases`.
- Increment orchestrator version to 0.1.8 for release.

Signed-off-by: Till Wegmueller <toasterson@gmail.com>
2025-11-17 22:00:46 +01:00
Till Wegmueller
a6ed0f0c69
Add libvirt-related environment handling, directory preparation, and bump version to 0.1.7
- Add default `LIBVIRT_URI`, `HOME`, and `XDG_CACHE_HOME` environment variable handling for `virsh` commands.
- Ensure writable cache directories for the service user in packaging scripts.
- Update systemd service to include libvirt-related environment defaults.
- Bump orchestrator version to 0.1.7.

Signed-off-by: Till Wegmueller <toasterson@gmail.com>
2025-11-17 21:50:17 +01:00
Till Wegmueller
bf94664a30
Refactor VM lifecycle handling and improve guest IP discovery, bump version to 0.1.6
- Adjust stopping, destroying, and persisting VM lifecycle events to ensure better sequencing and avoid races.
- Enhance `discover_guest_ip_virsh` with detailed logging, structured attempt tracking, and robust fallback mechanisms.
- Introduce `Attempt` struct to capture detailed command execution context for debugging.
- Update console log handling to snapshot logs early, minimizing race conditions.
- Bump orchestrator version to 0.1.6.

Signed-off-by: Till Wegmueller <toasterson@gmail.com>
2025-11-17 21:34:19 +01:00
Till Wegmueller
d5faf319ab
Add boot wait configuration and improve VM startup logging, bump version to 0.1.5
- Introduce `boot_wait_secs` configuration to delay IP discovery/SSH after VM startup.
- Capture console logs when no SSH logs are available for better debugging during failures.
- Add a utility function to snapshot and persist console logs into job logs.
- Update CLI and environment variable support for the `boot_wait_secs` parameter.
- Bump orchestrator version to 0.1.5.

Signed-off-by: Till Wegmueller <toasterson@gmail.com>
2025-11-17 21:12:54 +01:00
Till Wegmueller
5d8e79c8d4
Add support for results queue and routing key in MQ configuration, bump version to 0.1.4
- Introduce `results_queue` and `results_routing_key` to MQ configuration.
- Update message publishing and queue declaration logic to leverage new fields.
- Increment orchestrator version to 0.1.4.

Signed-off-by: Till Wegmueller <toasterson@gmail.com>
2025-11-17 20:51:57 +01:00
Till Wegmueller
8e21c2ba47
Remove unused systemd unit file hardening options, bump version to 0.1.3
Signed-off-by: Till Wegmueller <toasterson@gmail.com>
2025-11-17 20:05:21 +01:00
Till Wegmueller
0724a4c526
Enable libvirt feature for orchestrator and bump version to 0.1.2
- Add `--features libvirt` to orchestrator's Debian package build process.
- Update orchestrator version to 0.1.2 in `Cargo.toml`.

Signed-off-by: Till Wegmueller <toasterson@gmail.com>
2025-11-17 20:01:06 +01:00
Till Wegmueller
fad8e60ec1
Add Debian packaging support and network configuration enhancements
- Introduce Debian package build script using `cargo-deb` for orchestrator releases.
- Add systemd unit file and post-installation script for automatic service setup.
- Update `compose.yml` with host-only port bindings for Postgres and RabbitMQ.
- Introduce NGINX-based log proxy for orchestrator logs with Traefik support.
- Bump orchestrator version to 0.1.1 and update related Cargo metadata for packaging.
- Add example environment file for orchestrator configuration.

Signed-off-by: Till Wegmueller <toasterson@gmail.com>
2025-11-17 19:57:19 +01:00
Till Wegmueller
9dfa9c4b95
Enhance SSH handling with retries and robust error management, refactor guest IP discovery
- Implement SSH execution retries with exponential backoff and timeout handling.
- Replace `virsh domifaddr` with a multi-strategy IP discovery approach.
- Introduce `OrchestratorError` for consistent, structured error reporting.
- Improve runner deployment and SSH session utilities for readability and reliability.
- Add dependencies: `thiserror`, `anyhow` for streamlined error handling.

Signed-off-by: Till Wegmueller <toasterson@gmail.com>
2025-11-15 21:46:54 +01:00
Till Wegmueller
038d1161a6
Refactor Orchestrator with enhanced SSH handling, error management, and IP discovery support
- Implement retries for SSH-based job execution with configurable timeouts.
- Introduce `OrchestratorError` for consistent error handling across modules.
- Replace `virsh domifaddr` based guest IP discovery with a robust, multi-strategy approach.
- Refactor runner deployment and SSH-related utility functions for clarity.
- Add `thiserror` and `anyhow` dependencies for error management.
- Update persistence layer with improved error handling for database operations.

Signed-off-by: Till Wegmueller <toasterson@gmail.com>

Signed-off-by: Till Wegmueller <toasterson@gmail.com>
2025-11-15 21:46:19 +01:00
Till Wegmueller
c2fefb5167
Add per-job SSH key support, refactor scheduler for SSH-based job execution, and remove unused runner endpoint
- Introduce fields in `JobContext` for per-job SSH configuration, including user, key paths, and PEM contents.
- Update the scheduler to support SSH-based execution of jobs, including VM lifecycle management and SSH session handling.
- Add utility functions for SSH execution, guest IP discovery, and runner deployment.
- Remove the unused `/runners/{name}` HTTP endpoint and its associated logic.
- Simplify router creation by refactoring out disabled runner directory handling.

Signed-off-by: Till Wegmueller <toasterson@gmail.com>
2025-11-15 18:37:30 +01:00
Till Wegmueller
930efe547f
Add public runner URL configuration and enhance log streaming support
- Introduce options for specifying public runner base URLs (`SOLSTICE_RUNNER_BASE_URL`) and orchestrator contact addresses (`ORCH_CONTACT_ADDR`).
- Update `.env.sample` and `compose.yml` with new configuration fields for external log streaming and runner binary serving.
- Refactor runner URL handling and generation logic for improved flexibility.
- Enhance `cloud-init` templates with updated runner URL environment variables (`RUNNER_SINGLE` and `RUNNER_URLS`).
- Add unit tests for runner URL generation to verify various input cases.

Signed-off-by: Till Wegmueller <toasterson@gmail.com>
2025-11-11 20:24:20 +01:00
Till Wegmueller
f904cb88b2
Relax filesystem permissions for VM directories, overlays, and logs to support host libvirt/qemu access. Introduce dead-letter queue support with enriched error messages for failed jobs.
Signed-off-by: Till Wegmueller <toasterson@gmail.com>
2025-11-09 17:59:04 +01:00
Till Wegmueller
888aa26388
Add libvirt/KVM integration and Forgejo webhook support to Podman stack
- Extend `.env.sample` with libvirt configuration, Forgejo secrets, and image mapping defaults.
- Update `compose.yml` to enable libvirt integration, including required mounts, devices, and environment variables.
- Add Forgejo webhook configuration and commit status reporting with optional HMAC validation.
- Enhance the orchestrator container with libvirt dependencies and optional features for VM management.
- Document host preparation for libvirt/KVM and image directories in the README.
- Set default fallback values for Traefik ACME CA server.

Signed-off-by: Till Wegmueller <toasterson@gmail.com>
2025-11-09 17:58:36 +01:00
Till Wegmueller
11ce9cc881
Introduce centralized configuration handling via KDL and environment variables
This commit adds:
- A unified configuration system (`AppConfig`) that aggregates KDL files and environment variables with precedence handling.
- Example KDL configuration files for the orchestrator and forge-integration modules.
- Updates to orchestrator and forge-integration to load and apply configurations from `AppConfig`.
- Improved AMQP and database configuration with overlays from CLI, environment, or KDL.
- Deprecated `TODO.txt` as it's now represented in the configuration examples.
2025-11-06 23:48:03 +01:00
Till Wegmueller
0dabdf2bb2
Auto-detect orchestrator contact address and enhance platform-specific configurations
This commit introduces:
- Automatic detection of the orchestrator contact address when not explicitly provided.
- Platform-specific logic for determining reachable IPs, including libvirt network parsing (Linux) and external IP detection.
- Updates to GRPC address processing to handle both specific and unspecified hosts.
- Additional utility functions for parsing and detecting IPs in libvirt configurations.
2025-11-06 21:56:57 +01:00
Till Wegmueller
97599eb48d
Move runner logs to debug level and enable runner binary serving via orchestrator
This commit includes:
- Adjusted runner logs from `info` to `debug` for reduced deployment log verbosity while retaining visibility in CI.
- Added functionality to serve runner binaries directly from the orchestrator via HTTP.
- Introduced new `RUNNER_DIR` configuration to specify the binary directory, with default paths and URL composition.
- Updated HTTP routing to include runner file serving with validation and logging.
- Improved AMQP body logging with a utility for better error debugging.
- Updated task scripts for runner cross-building and serving, consolidating configurations and removing redundant files.
2025-11-06 21:44:06 +01:00
Till Wegmueller
06ae079b14
Add repository owner/name parsing and integrate with commit status updates
This commit introduces:
- A utility function to parse repository owner and name from URLs, supporting HTTPS, SSH, and Git formats.
- Enhancements to job messages and results with optional `repo_owner` and `repo_name` fields for downstream integrations.
- Updated orchestrator and forge-integration workflows to leverage parsed repository details for status updates and accurate routing.
2025-11-03 23:36:25 +01:00
Till Wegmueller
c00ce54112
Add heuristic failure detection and improve runner URL configuration
This commit introduces:
- A heuristic to mark jobs as failed if VMs stop quickly without generating logs.
- Improved configuration for runner URLs, including auto-detection of host IPs and default multi-OS runner URLs.
- Updates to the orchestrator's HTTP routing for consistency.
- New task scripts for Forge integration and updates to environment defaults for local development.
2025-11-03 22:36:31 +01:00
Till Wegmueller
81a93ef1a7
Enable job log persistence, HTTP server, and extend CI/packaging support
This commit introduces:
- Log persistence feature with a new `job_logs` table and related APIs for recording and retrieving job logs.
- An HTTP server for serving log endpoints and job results.
- Updates to the CI pipeline to enable persistence by default and ensure PostgreSQL readiness.
- Docker Compose updates with a Postgres service and MinIO integration for object storage.
- Packaging scripts for Arch Linux, including systemd service units for deployment.
2025-11-02 23:37:11 +01:00
Till Wegmueller
b84e97e513
Enhance runner with log streaming details, fallback repository fetch, and improved error handling
This commit improves the runner's functionality by adding:
- Detailed log streaming with request ID, stdout, and stderr line counts.
- Fallback mechanisms for repository fetch using HTTP archive when git commands fail.
- Enhanced error reporting for missing job scripts and reading errors.
- Updates to ensure compatibility with SunOS environments and non-interactive shells.
2025-11-02 20:36:13 +01:00
Till Wegmueller
5cfde45e4c
Update default illumos image to omnios-bloody and enhance image configuration
This commit replaces `openindiana-hipster` with `omnios-bloody` as the default illumos image in the orchestrator. It adds detailed configuration for the new image, including source URL, local path, and resource defaults, while retaining `openindiana-hipster` as a reference. Corresponding test cases and YAML updates are included.
2025-11-02 18:58:46 +01:00
Till Wegmueller
c1380b1095
Add local file:// source support for orchestrator image preparation
This commit enhances image handling in the orchestrator by adding support for `file://` sources. It introduces logic to handle both local file copying and decompression options, complementing the existing `http(s)://` download functionality.
2025-11-02 18:38:56 +01:00
Till Wegmueller
9597bbf64d
Add VM suspend handling, persistence updates, and orchestrator enhancements
This commit introduces:
- VM suspend support for timeout scenarios, allowing investigation of frozen states.
- Enhanced orchestrator persistence initialization with skip option for faster startup.
- Improvements to orchestrator logging, job state tracking, and VM runtime monitoring.
- Updates to CI tasks for capturing job request IDs and tracking completion statuses.
- Extended hypervisor capabilities, including libvirt console logging configuration.
2025-11-01 18:38:17 +01:00
Till Wegmueller
952262ede4
Upgrade dependencies for Axum, Tonic, Prost, and related build tools across crates
This commit updates multiple dependencies, including:
- `axum` upgraded to 0.8 for HTTP and webhook functionality.
- `tonic` upgraded to 0.14 for gRPC support.
- `prost` upgraded to 0.14 for protobuf processing.
- Addition of `tonic-prost` and `tonic-prost-build` for updated gRPC build configurations.

Relevant Cargo.toml entries and `build.rs` are adjusted to reflect these updates.
2025-11-01 15:24:09 +01:00
Till Wegmueller
033f9b5ab0
Format 2025-11-01 14:56:46 +01:00
Till Wegmueller
374dff5c04
Simplify variable initialization and remove unused imports across multiple crates 2025-11-01 14:44:42 +01:00
Till Wegmueller
1b7b2dd91b
Update parsing logic and upgrade dependencies across crates
This commit updates parsing logic by simplifying `.and_then(|e| e.value().as_string())` calls to `.and_then(|v| v.as_string())`. Additionally, it upgrades several crate dependencies, including `thiserror`, `sea-orm`, `lapin`, `virt`, and `kdl`, to their latest compatible versions for improved functionality and stability.
2025-11-01 14:44:16 +01:00
Till Wegmueller
0b54881558
Add support for multi-OS VM builds with cross-built runners and improved local development tooling
This commit introduces:
- Flexible runner URL configuration via `SOLSTICE_RUNNER_URL(S)` for cloud-init.
- Automated detection of OS-specific runner binaries during VM boot.
- Tasks for cross-building, serving, and orchestrating Solstice runners.
- End-to-end VM build flows for Linux and Illumos environments.
- Enhanced orchestration with multi-runner HTTP serving and log streaming.
2025-11-01 14:31:48 +01:00
Till Wegmueller
855aecbb10
Add gRPC support for VM runner log streaming and orchestrator integration
This commit introduces gRPC-based log streaming between the VM runner (`solstice-runner`) and orchestrator. Key updates include:
- Implemented gRPC server in the orchestrator for receiving and processing runner logs.
- Added log streaming and job result reporting in the `solstice-runner` client.
- Defined `runner.proto` with messages (`LogItem`, `JobEnd`) and the `Runner` service.
- Updated orchestrator to accept gRPC settings and start the server.
- Modified cloud-init user data to include gRPC endpoint and request ID for runners.
- Enhanced message queue logic to handle job results via `publish_job_result`.
- Configured `Cross.toml` for cross-compilation of the runner.
2025-11-01 12:14:50 +01:00
Till Wegmueller
e73b6ff49f
Refactor Solstice bootstrapping logic into standalone script
This commit replaces inline workflow preparation logic with a dedicated `solstice-bootstrap.sh` script, simplifying workspace setup, job execution, and shutdown processes. The change ensures cleaner orchestration and improves maintainability by centralizing the bootstrapping logic.
2025-10-26 22:09:37 +01:00
Till Wegmueller
4ca78144f2
Add VM state monitoring and graceful shutdown enhancements
This commit enhances the `Scheduler` to monitor VM states for completion, enabling more accurate termination detection. It introduces periodic polling combined with shutdown signals to halt operations gracefully. Additionally, VM lifecycle management in the hypervisor is updated with `state` retrieval for precise status assessments. The VM domain configuration now includes serial console support.
2025-10-26 21:59:55 +01:00
Till Wegmueller
bddd36b16f
Add cooperative shutdown support for Scheduler and AMQP consumer
This commit updates the `Scheduler` to support cooperative shutdown using `Notify`, allowing graceful termination of tasks and cleanup of placeholder VMs. Additionally, the AMQP consumer is enhanced with an explicit shutdown mechanism, ensuring proper resource cleanup, including closing channels and connections.
2025-10-26 21:13:56 +01:00
Till Wegmueller
6ff88529e6
Add configurable placeholder VM runtime and graceful shutdown logic
This commit introduces the ability to configure placeholder VM run time via an environment variable (`VM_PLACEHOLDER_RUN_SECS`) and updates the `Scheduler` to accept this duration. Additionally, it implements a graceful shutdown mechanism for the orchestrator, allowing cooperative shutdown of consumers and cleanup of resources.
2025-10-26 19:06:32 +01:00
Till Wegmueller
7918db3468
Enhance hypervisor image handling with dynamic format detection and raw conversion
This commit improves the hypervisor by:
- Adding support for detecting base image formats using `qemu-img info`.
- Dynamically setting the base image format for overlay creation.
- Automatically converting non-raw images to raw format for bhyve compatibility.
- Updating `Cargo.toml` to include `serde_json` for JSON parsing.
- Modifying default working directory logic for `ZonesHypervisor`.
2025-10-26 18:17:02 +01:00
Till Wegmueller
d05121b378
Switch orchestrator from libvirt crate to virt crate for Linux hypervisor backend
This commit replaces the `libvirt` crate with the `virt` crate for managing the libvirt backend on Linux. Key changes include:

- Updated `Cargo.toml` dependencies and feature configuration.
- Refactored hypervisor implementation to align with `virt` crate API.
- Improved error handling and lifecycle management for VMs and networks.
2025-10-26 16:08:36 +01:00