mirror of
https://github.com/CloudNebulaProject/wayray.git
synced 2026-04-10 13:10:41 +00:00
Add Phase 1 remote display pipeline design spec
This commit is contained in:
parent
547d62ca1e
commit
f0c367c829
1 changed files with 258 additions and 0 deletions
258
docs/ai/specs/2026-04-07-phase1-remote-display-design.md
Normal file
258
docs/ai/specs/2026-04-07-phase1-remote-display-design.md
Normal file
|
|
@ -0,0 +1,258 @@
|
|||
# Phase 1: Remote Display Pipeline — Design Spec
|
||||
|
||||
## Goal
|
||||
|
||||
Build a working remote display pipeline: `wrsrvd` runs headless on a
|
||||
Linux server, composites Wayland clients with PixmanRenderer, encodes
|
||||
frame diffs with zstd, and sends them over QUIC to `wrclient` running
|
||||
on macOS, which displays them in a native Cocoa window. Input flows
|
||||
back from client to server.
|
||||
|
||||
This combines the roadmap's Phase 1 (protocol + transport + encoding)
|
||||
and Phase 2 (client viewer) into a single deliverable, producing an
|
||||
end-to-end working system.
|
||||
|
||||
## Context
|
||||
|
||||
Phase 0 built a working Smithay compositor with Winit backend, but
|
||||
visual testing requires a display server — which is broken under UTM's
|
||||
virtualized GPU. The solution: make the server headless (no display
|
||||
server needed) and move the visual output to the client over the
|
||||
network. This is the actual WayRay production architecture.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
wrsrvd (Linux, headless) wrclient (macOS)
|
||||
───────────────────── ──────────────────
|
||||
Wayland clients render
|
||||
→ PixmanRenderer composites
|
||||
→ OutputDamageTracker finds dirty rects
|
||||
→ XOR diff against previous frame
|
||||
→ zstd compress per region
|
||||
→ QUIC Stream 1 (display) →→→ Receive FrameUpdate
|
||||
→ zstd decompress
|
||||
→ XOR apply to local framebuffer
|
||||
→ Upload to GPU texture (wgpu)
|
||||
→ Display in winit/Cocoa window
|
||||
|
||||
← QUIC Stream 2 (input) ←←← Capture keyboard/mouse (winit)
|
||||
→ Inject into Smithay seat → Serialize as protocol messages
|
||||
```
|
||||
|
||||
## Deliverables
|
||||
|
||||
### 1. Wire Protocol (`wayray-protocol` crate)
|
||||
|
||||
Message types serialized with `postcard` (compact binary, serde-based,
|
||||
no-std compatible).
|
||||
|
||||
**Control messages (QUIC Stream 0 — bidirectional):**
|
||||
|
||||
| Message | Fields | Direction |
|
||||
|---------|--------|-----------|
|
||||
| `ClientHello` | `version: u32`, `capabilities: Vec<String>` | client→server |
|
||||
| `ServerHello` | `version: u32`, `session_id: u64`, `output_width: u32`, `output_height: u32` | server→client |
|
||||
| `Ping` | `timestamp: u64` | either |
|
||||
| `Pong` | `timestamp: u64` | either |
|
||||
|
||||
**Display messages (QUIC Stream 1 — server→client, unidirectional):**
|
||||
|
||||
| Message | Fields |
|
||||
|---------|--------|
|
||||
| `FrameUpdate` | `sequence: u64`, `regions: Vec<DamageRegion>` |
|
||||
| `DamageRegion` | `x: u32`, `y: u32`, `width: u32`, `height: u32`, `data: Vec<u8>` (zstd-compressed XOR diff) |
|
||||
|
||||
**Input messages (QUIC Stream 2 — client→server, unidirectional):**
|
||||
|
||||
| Message | Fields |
|
||||
|---------|--------|
|
||||
| `KeyboardEvent` | `keycode: u32`, `state: KeyState`, `time: u32` |
|
||||
| `PointerMotion` | `x: f64`, `y: f64`, `time: u32` |
|
||||
| `PointerButton` | `button: u32`, `state: ButtonState`, `time: u32` |
|
||||
| `PointerAxis` | `axis: Axis`, `value: f64`, `time: u32` |
|
||||
|
||||
**Flow control:** Client sends `FrameAck { sequence: u64 }` on Stream 0
|
||||
after processing each frame. Server limits in-flight frames to prevent
|
||||
buffer bloat.
|
||||
|
||||
**Serialization:** Each message is length-prefixed (4-byte little-endian
|
||||
length, then postcard-encoded payload) for framing on QUIC streams.
|
||||
|
||||
### 2. QUIC Transport
|
||||
|
||||
**Server (wrsrvd):** quinn-based QUIC listener. Accepts one client at a
|
||||
time (multi-client comes with session management in Phase 3).
|
||||
|
||||
**Client (wrclient):** quinn-based QUIC connection to the server.
|
||||
|
||||
**TLS:** Self-signed certificates generated at first run, cached to
|
||||
`~/.config/wayray/`. Client uses `danger_accept_any_cert` for now —
|
||||
trust-on-first-use and proper PKI deferred to Phase 3.
|
||||
|
||||
**Stream model (logical channels, not literal QUIC stream IDs):**
|
||||
- Control channel: Bidirectional stream — hello, ping/pong, frame ack
|
||||
- Display channel: Unidirectional server→client stream — frame updates
|
||||
- Input channel: Unidirectional client→server stream — input events
|
||||
|
||||
Each channel opens its own QUIC stream at connection time. QUIC assigns
|
||||
the actual stream IDs based on directionality and open order.
|
||||
|
||||
### 3. Headless Backend (wrsrvd)
|
||||
|
||||
Replace the Winit backend as the default with a headless backend using
|
||||
PixmanRenderer.
|
||||
|
||||
**PixmanRenderer path:**
|
||||
- Create `PixmanRenderer` (no GPU, no EGL, no display server)
|
||||
- Allocate an in-memory framebuffer as the render target
|
||||
- Render with `OutputDamageTracker` for damage tracking
|
||||
- Read pixels directly from RAM (no `ExportMem` needed)
|
||||
- Drive the render loop on a calloop timer (render-on-damage, cap 60fps)
|
||||
|
||||
**Backend selection via CLI:**
|
||||
- `wrsrvd` — headless (default)
|
||||
- `wrsrvd --backend winit` — Winit window (dev/debug, requires display)
|
||||
|
||||
**Winit backend stays** as-is in a separate module, feature-gated or
|
||||
behind the CLI flag.
|
||||
|
||||
### 4. Frame Encoding (Tier 1)
|
||||
|
||||
Simplest viable encoding — optimize later.
|
||||
|
||||
**Encoding (server):**
|
||||
1. After render, get the new framebuffer (ARGB8888 pixels in RAM)
|
||||
2. XOR against the previous frame to produce a diff
|
||||
3. For each damage rectangle from OutputDamageTracker:
|
||||
- Extract the XOR diff region
|
||||
- Compress with zstd (level 1 — fast)
|
||||
- Package as a `DamageRegion` in the `FrameUpdate` message
|
||||
4. Store current frame as the new "previous frame"
|
||||
|
||||
**Decoding (client):**
|
||||
1. Receive `FrameUpdate`
|
||||
2. For each `DamageRegion`:
|
||||
- Decompress with zstd
|
||||
- XOR-apply onto the local framebuffer copy at the given position
|
||||
3. Upload the updated framebuffer to a GPU texture
|
||||
4. Display
|
||||
|
||||
**Why XOR + zstd:** XOR diff makes unchanged pixels zero. zstd compresses
|
||||
runs of zeros extremely well. For typical desktop content (mostly static
|
||||
with small changes), this gives 10-100x compression with minimal CPU.
|
||||
|
||||
### 5. Client Viewer (wrclient)
|
||||
|
||||
Native application using winit + wgpu.
|
||||
|
||||
**Window:** winit creates a platform-native window (Cocoa on macOS).
|
||||
Sized to match the server's output dimensions from `ServerHello`.
|
||||
|
||||
**Rendering:** wgpu renders a fullscreen textured quad. Each frame,
|
||||
the updated pixel buffer is uploaded to a GPU texture and drawn.
|
||||
Double-buffered with vsync.
|
||||
|
||||
**Input capture:** winit captures keyboard and mouse events from the
|
||||
native window. Events are serialized as protocol messages and sent
|
||||
over QUIC Stream 2.
|
||||
|
||||
**CLI:**
|
||||
```
|
||||
wrclient <host>:<port>
|
||||
```
|
||||
|
||||
## Dependencies
|
||||
|
||||
**New workspace dependencies (`Cargo.toml`):**
|
||||
|
||||
| Crate | Purpose |
|
||||
|-------|---------|
|
||||
| `quinn` | QUIC transport |
|
||||
| `rustls` | TLS for QUIC |
|
||||
| `rcgen` | Self-signed certificate generation |
|
||||
| `postcard` + `serde` | Wire protocol serialization |
|
||||
| `zstd` | Frame compression |
|
||||
| `winit` | Window creation (wrclient) |
|
||||
| `wgpu` | GPU rendering (wrclient) |
|
||||
|
||||
**Smithay feature changes (wrsrvd):**
|
||||
- Add: `renderer_pixman` (headless rendering)
|
||||
- Keep: `wayland_frontend`, `desktop`, `renderer_gl`, `backend_winit`
|
||||
(behind feature flag for dev mode)
|
||||
|
||||
## Module Structure
|
||||
|
||||
```
|
||||
crates/
|
||||
├── wayray-protocol/src/
|
||||
│ ├── lib.rs # Re-exports
|
||||
│ ├── messages.rs # All message types + serde derives
|
||||
│ ├── codec.rs # Length-prefixed framing (encode/decode)
|
||||
│ └── version.rs # Protocol version constant
|
||||
│
|
||||
├── wrsrvd/src/
|
||||
│ ├── main.rs # CLI arg parsing, backend selection
|
||||
│ ├── state.rs # WayRay compositor state (unchanged)
|
||||
│ ├── handlers/ # Wayland protocol handlers (unchanged)
|
||||
│ ├── backend/
|
||||
│ │ ├── mod.rs # Backend trait/enum
|
||||
│ │ ├── headless.rs # PixmanRenderer + in-memory framebuffer
|
||||
│ │ └── winit.rs # Existing Winit backend (moved from main.rs)
|
||||
│ ├── render.rs # Render logic (adapted for both backends)
|
||||
│ ├── encoder.rs # XOR diff + zstd compression
|
||||
│ └── network/
|
||||
│ ├── mod.rs # Re-exports
|
||||
│ └── server.rs # QUIC server, frame sending, input receiving
|
||||
│
|
||||
├── wrclient/src/
|
||||
│ ├── main.rs # CLI args, connect, main loop
|
||||
│ ├── decoder.rs # zstd decompress + XOR apply
|
||||
│ ├── display.rs # winit window + wgpu rendering
|
||||
│ ├── input.rs # Keyboard/mouse capture + serialization
|
||||
│ └── network/
|
||||
│ ├── mod.rs # Re-exports
|
||||
│ └── client.rs # QUIC client, frame receiving, input sending
|
||||
│
|
||||
└── wradm/src/
|
||||
└── main.rs # Unchanged (placeholder)
|
||||
```
|
||||
|
||||
## Success Criteria
|
||||
|
||||
1. `wrsrvd` starts headless on the Linux host, no display server needed
|
||||
2. `wrclient` connects from macOS over the local network
|
||||
3. Launch a Wayland client (foot, weston-terminal) into wrsrvd
|
||||
4. The client window appears on the Mac in the wrclient window
|
||||
5. Typing and clicking in the wrclient window reaches the Wayland client
|
||||
6. Frame updates are visible in real-time
|
||||
|
||||
## Design Notes
|
||||
|
||||
**Cursor:** Rendered server-side into the framebuffer for Phase 1
|
||||
(simplest). No separate cursor channel or client-side cursor.
|
||||
|
||||
**Input injection:** Network input events are injected into the Smithay
|
||||
seat by calling the seat's keyboard/pointer methods directly (same
|
||||
pattern as the existing Winit input handler in `state.rs`). No custom
|
||||
`InputBackend` needed.
|
||||
|
||||
**Protocol evolution:** The message types defined here are Phase 1
|
||||
subsets. They will evolve toward the full wire protocol spec
|
||||
(`docs/protocols/wayray-wire-protocol.md`) in later phases.
|
||||
|
||||
**PixmanRenderer pixel access:** Verify early that PixmanRenderer's
|
||||
framebuffer can be read directly from RAM. If Smithay requires
|
||||
`ExportMem` even for Pixman, adapt accordingly — Pixman's `ExportMem`
|
||||
is a simple memcpy (no GPU readback).
|
||||
|
||||
## Out of Scope
|
||||
|
||||
- Audio forwarding (Phase 4)
|
||||
- USB forwarding (Phase 4)
|
||||
- Session management / hot-desking (Phase 3)
|
||||
- Content-adaptive encoding (Tier 2/3 — future optimization)
|
||||
- Hardware video encoding (H.264/AV1 — future optimization)
|
||||
- Multi-client support (Phase 3)
|
||||
- Proper TLS certificate validation (Phase 3)
|
||||
- Window management protocol (Phase 2.5)
|
||||
Loading…
Add table
Reference in a new issue