wayray/docs/ai/specs/2026-04-07-phase1-remote-display-design.md
2026-04-07 15:09:03 +02:00

9.8 KiB

Phase 1: Remote Display Pipeline — Design Spec

Goal

Build a working remote display pipeline: wrsrvd runs headless on a Linux server, composites Wayland clients with PixmanRenderer, encodes frame diffs with zstd, and sends them over QUIC to wrclient running on macOS, which displays them in a native Cocoa window. Input flows back from client to server.

This combines the roadmap's Phase 1 (protocol + transport + encoding) and Phase 2 (client viewer) into a single deliverable, producing an end-to-end working system.

Context

Phase 0 built a working Smithay compositor with Winit backend, but visual testing requires a display server — which is broken under UTM's virtualized GPU. The solution: make the server headless (no display server needed) and move the visual output to the client over the network. This is the actual WayRay production architecture.

Architecture

wrsrvd (Linux, headless)              wrclient (macOS)
─────────────────────                 ──────────────────
Wayland clients render
  → PixmanRenderer composites
  → OutputDamageTracker finds dirty rects
  → XOR diff against previous frame
  → zstd compress per region
  → QUIC Stream 1 (display)    →→→    Receive FrameUpdate
                                      → zstd decompress
                                      → XOR apply to local framebuffer
                                      → Upload to GPU texture (wgpu)
                                      → Display in winit/Cocoa window

  ← QUIC Stream 2 (input)     ←←←    Capture keyboard/mouse (winit)
  → Inject into Smithay seat          → Serialize as protocol messages

Deliverables

1. Wire Protocol (wayray-protocol crate)

Message types serialized with postcard (compact binary, serde-based, no-std compatible).

Control messages (QUIC Stream 0 — bidirectional):

Message Fields Direction
ClientHello version: u32, capabilities: Vec<String> client→server
ServerHello version: u32, session_id: u64, output_width: u32, output_height: u32 server→client
Ping timestamp: u64 either
Pong timestamp: u64 either

Display messages (QUIC Stream 1 — server→client, unidirectional):

Message Fields
FrameUpdate sequence: u64, regions: Vec<DamageRegion>
DamageRegion x: u32, y: u32, width: u32, height: u32, data: Vec<u8> (zstd-compressed XOR diff)

Input messages (QUIC Stream 2 — client→server, unidirectional):

Message Fields
KeyboardEvent keycode: u32, state: KeyState, time: u32
PointerMotion x: f64, y: f64, time: u32
PointerButton button: u32, state: ButtonState, time: u32
PointerAxis axis: Axis, value: f64, time: u32

Flow control: Client sends FrameAck { sequence: u64 } on Stream 0 after processing each frame. Server limits in-flight frames to prevent buffer bloat.

Serialization: Each message is length-prefixed (4-byte little-endian length, then postcard-encoded payload) for framing on QUIC streams.

2. QUIC Transport

Server (wrsrvd): quinn-based QUIC listener. Accepts one client at a time (multi-client comes with session management in Phase 3).

Client (wrclient): quinn-based QUIC connection to the server.

TLS: Self-signed certificates generated at first run, cached to ~/.config/wayray/. Client uses danger_accept_any_cert for now — trust-on-first-use and proper PKI deferred to Phase 3.

Stream model (logical channels, not literal QUIC stream IDs):

  • Control channel: Bidirectional stream — hello, ping/pong, frame ack
  • Display channel: Unidirectional server→client stream — frame updates
  • Input channel: Unidirectional client→server stream — input events

Each channel opens its own QUIC stream at connection time. QUIC assigns the actual stream IDs based on directionality and open order.

3. Headless Backend (wrsrvd)

Replace the Winit backend as the default with a headless backend using PixmanRenderer.

PixmanRenderer path:

  • Create PixmanRenderer (no GPU, no EGL, no display server)
  • Allocate an in-memory framebuffer as the render target
  • Render with OutputDamageTracker for damage tracking
  • Read pixels directly from RAM (no ExportMem needed)
  • Drive the render loop on a calloop timer (render-on-damage, cap 60fps)

Backend selection via CLI:

  • wrsrvd — headless (default)
  • wrsrvd --backend winit — Winit window (dev/debug, requires display)

Winit backend stays as-is in a separate module, feature-gated or behind the CLI flag.

4. Frame Encoding (Tier 1)

Simplest viable encoding — optimize later.

Encoding (server):

  1. After render, get the new framebuffer (ARGB8888 pixels in RAM)
  2. XOR against the previous frame to produce a diff
  3. For each damage rectangle from OutputDamageTracker:
    • Extract the XOR diff region
    • Compress with zstd (level 1 — fast)
    • Package as a DamageRegion in the FrameUpdate message
  4. Store current frame as the new "previous frame"

Decoding (client):

  1. Receive FrameUpdate
  2. For each DamageRegion:
    • Decompress with zstd
    • XOR-apply onto the local framebuffer copy at the given position
  3. Upload the updated framebuffer to a GPU texture
  4. Display

Why XOR + zstd: XOR diff makes unchanged pixels zero. zstd compresses runs of zeros extremely well. For typical desktop content (mostly static with small changes), this gives 10-100x compression with minimal CPU.

5. Client Viewer (wrclient)

Native application using winit + wgpu.

Window: winit creates a platform-native window (Cocoa on macOS). Sized to match the server's output dimensions from ServerHello.

Rendering: wgpu renders a fullscreen textured quad. Each frame, the updated pixel buffer is uploaded to a GPU texture and drawn. Double-buffered with vsync.

Input capture: winit captures keyboard and mouse events from the native window. Events are serialized as protocol messages and sent over QUIC Stream 2.

CLI:

wrclient <host>:<port>

Dependencies

New workspace dependencies (Cargo.toml):

Crate Purpose
quinn QUIC transport
rustls TLS for QUIC
rcgen Self-signed certificate generation
postcard + serde Wire protocol serialization
zstd Frame compression
winit Window creation (wrclient)
wgpu GPU rendering (wrclient)

Smithay feature changes (wrsrvd):

  • Add: renderer_pixman (headless rendering)
  • Keep: wayland_frontend, desktop, renderer_gl, backend_winit (behind feature flag for dev mode)

Module Structure

crates/
├── wayray-protocol/src/
│   ├── lib.rs          # Re-exports
│   ├── messages.rs     # All message types + serde derives
│   ├── codec.rs        # Length-prefixed framing (encode/decode)
│   └── version.rs      # Protocol version constant
│
├── wrsrvd/src/
│   ├── main.rs         # CLI arg parsing, backend selection
│   ├── state.rs        # WayRay compositor state (unchanged)
│   ├── handlers/       # Wayland protocol handlers (unchanged)
│   ├── backend/
│   │   ├── mod.rs      # Backend trait/enum
│   │   ├── headless.rs # PixmanRenderer + in-memory framebuffer
│   │   └── winit.rs    # Existing Winit backend (moved from main.rs)
│   ├── render.rs       # Render logic (adapted for both backends)
│   ├── encoder.rs      # XOR diff + zstd compression
│   └── network/
│       ├── mod.rs      # Re-exports
│       └── server.rs   # QUIC server, frame sending, input receiving
│
├── wrclient/src/
│   ├── main.rs         # CLI args, connect, main loop
│   ├── decoder.rs      # zstd decompress + XOR apply
│   ├── display.rs      # winit window + wgpu rendering
│   ├── input.rs        # Keyboard/mouse capture + serialization
│   └── network/
│       ├── mod.rs      # Re-exports
│       └── client.rs   # QUIC client, frame receiving, input sending
│
└── wradm/src/
    └── main.rs         # Unchanged (placeholder)

Success Criteria

  1. wrsrvd starts headless on the Linux host, no display server needed
  2. wrclient connects from macOS over the local network
  3. Launch a Wayland client (foot, weston-terminal) into wrsrvd
  4. The client window appears on the Mac in the wrclient window
  5. Typing and clicking in the wrclient window reaches the Wayland client
  6. Frame updates are visible in real-time

Design Notes

Cursor: Rendered server-side into the framebuffer for Phase 1 (simplest). No separate cursor channel or client-side cursor.

Input injection: Network input events are injected into the Smithay seat by calling the seat's keyboard/pointer methods directly (same pattern as the existing Winit input handler in state.rs). No custom InputBackend needed.

Protocol evolution: The message types defined here are Phase 1 subsets. They will evolve toward the full wire protocol spec (docs/protocols/wayray-wire-protocol.md) in later phases.

PixmanRenderer pixel access: Verify early that PixmanRenderer's framebuffer can be read directly from RAM. If Smithay requires ExportMem even for Pixman, adapt accordingly — Pixman's ExportMem is a simple memcpy (no GPU readback).

Out of Scope

  • Audio forwarding (Phase 4)
  • USB forwarding (Phase 4)
  • Session management / hot-desking (Phase 3)
  • Content-adaptive encoding (Tier 2/3 — future optimization)
  • Hardware video encoding (H.264/AV1 — future optimization)
  • Multi-client support (Phase 3)
  • Proper TLS certificate validation (Phase 3)
  • Window management protocol (Phase 2.5)