wayray/RESEARCH.md
Till Wegmueller 167c6c17c6
Add project documentation, architecture decisions, and usage book
Comprehensive documentation for WayRay, a SunRay-like thin client
Wayland compositor targeting illumos and Linux:

- CLAUDE.md: project context and conventions
- docs/ai/plans: 6-phase implementation roadmap
- docs/ai/adr: 9 architecture decision records (Smithay, QUIC,
  frame encoding, session management, rendering, audio, project
  structure, illumos support, pluggable window management)
- docs/architecture: system architecture overview with diagrams
- docs/protocols: WayRay wire protocol specification
- book/: mdbook user guide (introduction, concepts, server/client
  guides, admin, development)
- RESEARCH.md: deep research on remote display protocols
2026-03-28 20:47:16 +01:00

48 KiB

Remote Display & Thin Client Technologies Research

Comprehensive research for building a SunRay-like thin client system. Covers protocols, capture mechanisms, encoding, networking, and audio/USB forwarding.


Table of Contents

  1. SPICE Protocol
  2. RDP (Remote Desktop Protocol)
  3. VNC / RFB Protocol
  4. Waypipe
  5. PipeWire Screen Capture
  6. Video Codecs for Remote Display
  7. Network Protocols
  8. Framebuffer Capture Techniques
  9. Audio Forwarding
  10. USB/IP
  11. Modern Thin Client Projects
  12. Architecture Recommendations

1. SPICE Protocol

SPICE (Simple Protocol for Independent Computing Environments) is a remote display protocol originally developed by Qumranet (acquired by Red Hat). It is the most architecturally relevant existing protocol for a SunRay-like system.

Architecture

SPICE has a four-component architecture:

  • Protocol: Wire format specification for all messages
  • Server (libspice-server): Runs inside the hypervisor/host, directly accesses the virtual GPU framebuffer
  • Client (spice-gtk, remote-viewer): Renders display, captures input, handles USB/audio
  • Guest Agent (spice-vdagent): Runs inside the guest VM for clipboard, resolution changes, file transfer

Channel Architecture

Each SPICE session consists of multiple independent TCP/TLS connections, one per channel type:

Channel ID Purpose
Main 1 Session management, migration, agent communication
Display 2 Rendering commands, images, video streams
Inputs 3 Keyboard and mouse events
Cursor 4 Pointer shape and position
Playback 5 Audio output (server -> client)
Record 6 Audio input (client -> server)
Smartcard 8 Smartcard passthrough
USB Redir 9 USB device forwarding via usbredir
Port 10 Generic data port
Webdav 11 File sharing via WebDAV

Display Channel & Image Compression

The display channel is the most complex. SPICE does not just send raw framebuffer pixels. Instead it sends rendering commands (draw operations, images, etc.) and tries to offload rendering to the client GPU.

Image compression algorithms (selectable at runtime):

  • Quic: Proprietary algorithm based on SFALIC. Optimized for photographic/natural images
  • LZ: Standard Lempel-Ziv. Good for text/UI content
  • GLZ (Global LZ): LZ with a history-based global dictionary that exploits repeating patterns across images. Critical for WAN performance
  • Auto mode: Heuristically selects Quic vs. LZ/GLZ per-image based on content type

Video streaming: The server heuristically detects video regions (rapidly changing rectangular areas) and encodes them as M-JPEG streams, dramatically reducing bandwidth for video playback.

Caching: Images, palettes, and cursor data are cached on the client side to avoid retransmission.

Key Design Insights for WayRay

  • The multi-channel approach allows independent QoS per data type
  • Sending rendering commands rather than raw pixels is more bandwidth-efficient
  • The automatic image compression selection based on content type is clever
  • GLZ's global dictionary approach is excellent for WAN scenarios
  • Video region detection and switching to video codec is a critical optimization

Sources


2. RDP (Remote Desktop Protocol)

RDP is Microsoft's proprietary remote desktop protocol, based on the ITU T.120 family of protocols. Default port: TCP/UDP 3389.

Architecture

RDP uses a client-server model with a layered architecture:

  1. Transport Layer: TCP/IP (traditional) or UDP (for lossy/real-time data)
  2. Security Layer: TLS/NLA (Network Level Authentication)
  3. Core Protocol: PDU (Protocol Data Unit) processing, state machine
  4. Virtual Channel System: Extensible channel framework for features

Server-side components:

  • Wdtshare.sys: RDP driver handling UI transfer, compression, encryption, framing
  • Tdtcp.sys: Transport driver packaging the protocol onto TCP/IP

Virtual Channel System

RDP's extensibility comes from its virtual channel architecture:

Static Virtual Channels (SVC):

  • Negotiated during connection setup
  • Fixed for session lifetime
  • Name limited to 8 bytes
  • Examples: RDPSND (audio), CLIPRDR (clipboard), RDPDR (device redirection)

Dynamic Virtual Channels (DVC):

  • Built on top of the DRDYNVC static channel
  • Can be opened/closed during a session
  • Used for modern features: graphics pipeline, USB redirection, diagnostics
  • Microsoft's recommended approach for new development

Graphics Pipeline

RDP has evolved through several graphics approaches:

  1. GDI Remoting (original): Send Windows GDI drawing commands
  2. RemoteFX Codec: Wavelet-based (DWT + RLGR encoding), supports lossless and lossy modes
  3. RemoteFX Progressive Codec: Progressive rendering for WAN - sends low quality first, refines incrementally
  4. GFX Pipeline (MS-RDPEGFX): Modern graphics extension supporting:
    • AVC/H.264 encoding for video content
    • RemoteFX for non-video content
    • Adaptive selection based on content type and bandwidth

Note: RemoteFX vGPU was deprecated in 2020 due to security vulnerabilities; the codec itself lives on in the GFX pipeline.

FreeRDP

FreeRDP is the dominant open-source RDP implementation (Apache 2.0 license):

  • Written primarily in C (87.8%)
  • Clean separation: libfreerdp (protocol) vs. client frontends vs. server implementations
  • Powers Remmina, GNOME Connections, KRDC, and most Linux RDP clients
  • Implements the full virtual channel system including GFX pipeline

Key Design Insights for WayRay

  • The SVC/DVC split is instructive: start with fixed channels, add dynamic ones later
  • Progressive rendering is excellent for variable-bandwidth scenarios
  • Content-adaptive encoding (H.264 for video, wavelet for desktop) is the modern approach
  • FreeRDP's architecture (protocol library separate from client/server) is a good model

Sources


3. VNC / RFB Protocol

VNC (Virtual Network Computing) uses the RFB (Remote Framebuffer) protocol, standardized in RFC 6143.

Architecture

RFB is a simple, stateless framebuffer protocol. The fundamental design:

  • The display side is based on a single primitive: "put a rectangle of pixel data at position (x, y)"
  • A sequence of rectangles makes a framebuffer update
  • The protocol is client-pull: the client requests updates, the server sends them
  • Pixel format is negotiated: 24-bit true color, 16-bit, or 8-bit color-mapped

Encoding Types

The encoding system is the key to VNC performance. Different encodings trade off bandwidth, client CPU, and server CPU:

Encoding Description Best For
Raw Uncompressed pixel data, scanline order Fast LAN, low CPU
CopyRect Reference to existing framebuffer region Window moves, scrolling
RRE Rise-and-Run-length Encoding, rectangles of solid color Simple UIs
Hextile 16x16 tile subdivision with RRE within tiles Fast LAN (low CPU overhead)
Zlib Raw data compressed with zlib Moderate bandwidth savings
Tight Intelligent per-rectangle compression selection (zlib, JPEG, indexed color, solid) Low bandwidth / WAN
ZRLE Zlib Run-Length Encoding, combines zlib with palette/RLE Good all-around
TurboVNC/Tight+JPEG Tight with aggressive JPEG for photographic regions Video content, high FPS

Pseudo-encodings allow clients to advertise extension support (cursor shape, desktop resize, etc.) without changing the core protocol.

Performance Characteristics

  • Fast LAN: Hextile or Raw (minimize CPU overhead)
  • WAN/Low bandwidth: Tight (best compression ratios, especially for mixed content)
  • Photo/Video content: Tight with JPEG (TurboVNC achieves 4x better performance than ZRLE for images)
  • Scrolling/Window moves: CopyRect (near-zero bandwidth)

Key Design Insights for WayRay

  • CopyRect-style "reference previous frame data" is extremely efficient for common desktop operations
  • Per-rectangle encoding selection (as in Tight) is superior to one-size-fits-all
  • RFB's simplicity is both its strength (easy to implement) and weakness (no audio, USB, etc.)
  • The client-pull model introduces latency; a push model with damage tracking is better

Sources


4. Waypipe

Waypipe is a proxy for Wayland clients, analogous to ssh -X for X11. It is the most directly relevant existing project for Wayland remote display.

Architecture

Waypipe operates as a paired proxy system:

[Remote App] <--Wayland--> [waypipe server] <--socket/SSH--> [waypipe client] <--Wayland--> [Local Compositor]
  • Server mode: Acts as a Wayland compositor stub on the remote side. Wayland apps connect to it as if it were a real compositor.
  • Client mode: Connects to the local real compositor and forwards surface updates from the remote side.
  • SSH integration: waypipe ssh user@host app sets up the tunnel automatically.

Buffer Synchronization

This is the key technical innovation:

  1. Waypipe keeps a mirror copy of each shared memory buffer
  2. When a buffer is committed, waypipe diffs the current buffer against the mirror
  3. Only changed regions are transmitted
  4. The remote side applies the diff to reconstruct the buffer

Compression Options

Method Use Case Default
none High-bandwidth LAN No
lz4 General purpose, fast Yes (default)
zstd Low-bandwidth / WAN No

Compression ratios: 30x for text-heavy content, down to 1.5x for noisy images.

Video Encoding (DMA-BUF)

For DMA-BUF buffers (GPU-rendered content), waypipe supports lossy video encoding:

  • --video=sw,bpf=120000,h264 (default when --video is used)
  • Software encoding (libx264) or hardware encoding (VAAPI)
  • With VAAPI on Intel Gen8 iGPU: 80 FPS at 4 MB/s bandwidth
  • Configurable bits-per-frame for quality/bandwidth tradeoff

Protocol Handling

Waypipe parses the Wayland wire protocol, which is partially self-describing. It:

  • Intercepts buffer-related messages (wl_shm, wl_buffer, linux-dmabuf)
  • Passes through other messages transparently
  • Is partially forward-compatible with new Wayland protocols

Limitations

  • Per-application, not whole-desktop
  • No built-in audio forwarding
  • No USB forwarding
  • Performance depends heavily on application rendering patterns
  • Latency can be noticeable for interactive use

Key Design Insights for WayRay

  • The diff-based buffer synchronization is very efficient for incremental updates
  • VAAPI video encoding for DMA-BUF is the right approach for GPU-rendered content
  • Per-application forwarding is limiting; a whole-compositor approach is better for a thin client
  • The Wayland protocol's design (buffer passing, damage tracking) is well-suited for remote display

Sources


5. PipeWire Screen Capture

PipeWire is the modern Linux multimedia framework that unifies audio, video, and screen capture.

Portal-Based Screen Capture Architecture

On Wayland, screen capture follows a security-first architecture:

[Application] --> [xdg-desktop-portal (D-Bus)] --> [Portal Backend (compositor-specific)]
                                                         |
                                                    [PipeWire Stream]
                                                         |
                                                  [Application receives frames]

Flow:

  1. Application calls org.freedesktop.portal.ScreenCast.CreateSession() via D-Bus
  2. Portal presents a permission dialog to the user
  3. On approval, SelectSources() lets user choose output/window
  4. Start() creates a PipeWire stream and returns a pipewire_fd
  5. Application connects to PipeWire using this fd and receives frames

Buffer Sharing Mechanisms

PipeWire supports two buffer types for screen capture:

DMA-BUF (preferred):

  • Zero-copy transfer from compositor GPU memory to consumer
  • Buffer stays in GPU VRAM throughout the pipeline
  • Ideal for hardware video encoding (capture -> encode without CPU copy)
  • Format/modifier negotiation ensures compatibility

memfd (fallback):

  • Shared memory file descriptor
  • Requires CPU copy from GPU to system memory
  • Universal compatibility but higher overhead

Wayland Capture Protocols

Three generations of capture protocols exist:

  1. wlr-export-dmabuf-unstable-v1 (legacy): Exports entire output as DMA-BUF frames. Simple but no damage tracking.

  2. wlr-screencopy-unstable-v1 (deprecated): More flexible, supports shared memory and DMA-BUF. Has damage tracking via copy_with_damage. Being replaced.

  3. ext-image-copy-capture-v1 (current, merged 2024): The new standard protocol:

    • Client specifies which buffer regions need updating
    • Compositor only fills changed regions
    • Supports both output capture and window capture
    • Initial implementations: wlroots, WayVNC, grim

GNOME's Approach

GNOME/Mutter uses different D-Bus APIs:

  • org.gnome.Mutter.ScreenCast: Provides PipeWire stream of screen content
  • org.gnome.Mutter.RemoteDesktop: Provides input injection
  • These power gnome-remote-desktop which speaks RDP (and VNC)

Key Design Insights for WayRay

  • ext-image-copy-capture-v1 + PipeWire is the correct modern capture stack
  • DMA-BUF capture -> hardware encode is the zero-copy golden path
  • The portal system provides proper security/permission handling
  • For a thin client server running its own compositor, you can skip the portal and use the capture protocols directly
  • Damage tracking in ext-image-copy-capture-v1 is essential for efficient updates

Sources


6. Video Codecs for Remote Display

Codec Comparison for Low-Latency Use

Property H.264/AVC H.265/HEVC AV1
Compression efficiency Baseline ~35% better than H.264 ~50% better than H.264
Encoding latency Lowest Low Moderate (improving)
Hardware encode support Universal Widespread Newer GPUs only
Patent/license Licensed (but ubiquitous) Licensed (complex) Royalty-free
Screen content coding Limited Better Best (dedicated tools)
Decode support Universal Nearly universal Growing rapidly
Best for Maximum compatibility Good quality/bandwidth Best quality, royalty-free

Low-Latency Encoding Considerations

For remote desktop, encoding latency is critical. Key settings:

Frame structure:

  • No B-frames: B-frames require future frames, adding latency
  • No lookahead: Lookahead improves quality but adds latency
  • No frame reordering: Frames must be encoded/decoded in order
  • Single slice / low-delay profile: Minimizes buffering

Rate control:

  • CBR (Constant Bit Rate): Keeps network queues short and predictable
  • VBR with max bitrate cap: Better quality but can cause bandwidth spikes
  • CBR is generally preferred for remote desktop due to predictable latency

Intra refresh:

  • Periodic I-frames are large and cause bandwidth spikes
  • Gradual Intra Refresh (GIR): Spreads intra-coded blocks across frames, avoiding spikes
  • Essential for smooth, low-latency streaming

AV1 Specific Advantages

AV1 has features specifically useful for remote desktop:

  • Screen Content Coding (SCC): Dedicated tools for text, UI elements, and screen captures that dramatically reduce bitrate
  • Temporal Scalability (SVC): L1T2 mode (1 spatial layer, 2 temporal layers) allows dropping frames gracefully under bandwidth pressure
  • Film Grain Synthesis: Can transmit film grain parameters instead of actual grain, saving bandwidth

Chrome's libaom AV1 encoder (speed 10): 12% better quality than VP9 at same bandwidth, 25% faster encoding.

Hardware Encoding

NVIDIA NVENC

  • Available on GeForce GTX 600+ and all Quadro/Tesla with Kepler+
  • Video Codec SDK v13.0 (2025): AV1 ultra-high quality mode, comparable to software AV1 encoding
  • Latency modes:
    • Normal Latency: Default, uses B-frames and lookahead
    • Low Latency: No B-frames, no reordering
    • Ultra Low Latency: Strict in-order pipeline, minimal frame queuing
  • Dedicated hardware encoder block (does not consume CUDA cores)
  • Can encode 4K@120fps with sub-frame latency

Intel VAAPI (Video Acceleration API)

  • Open-source API (libva) supported on Intel Gen8+ (Broadwell+)
  • Supports H.264, H.265, AV1 (Intel Arc/Gen12+), VP9
  • FFmpeg integration: h264_vaapi, hevc_vaapi, av1_vaapi
  • Low-power encoding mode available on some platforms
  • GStreamer integration via gstreamer-vaapi
  • Well-suited for always-on server scenarios (low power consumption)

AMD AMF/VCN

  • Video Core Next (VCN) hardware encoder
  • Supports H.264, H.265, AV1 (RDNA 3+)
  • AMF (Advanced Media Framework) SDK
  • VAAPI support via Mesa radeonsi driver
  • VCN 4.0+ competitive with NVENC in quality

Key Design Insights for WayRay

  • Start with H.264 for maximum compatibility, add H.265/AV1 as options
  • Use VAAPI as the primary encoding API (works across Intel/AMD, open-source)
  • Add NVENC support via FFmpeg/GStreamer for NVIDIA GPUs
  • CBR + no B-frames + gradual intra refresh for lowest latency
  • AV1's screen content coding mode is a significant advantage for desktop content
  • The DMA-BUF -> VAAPI encode path is zero-copy and should be the primary pipeline

Sources


7. Network Protocols

TCP vs. UDP vs. QUIC for Remote Display

Property TCP UDP QUIC
Reliability Full (retransmit) None Selectable per-stream
Head-of-line blocking Yes (single stream) No No (multiplexed streams)
Connection setup 1-3 RTT (TCP + TLS) 0 RTT 0-1 RTT
Congestion control Kernel-space, slow to update Application-managed User-space, pluggable
NAT/firewall traversal Good Moderate Moderate (UDP-based)
Encryption Optional (TLS) Optional (DTLS) Mandatory (TLS 1.3)

QUIC Advantages for Remote Display

QUIC is increasingly compelling for remote display:

  1. Stream multiplexing without HOL blocking: Display, input, audio can be separate QUIC streams. A lost display packet doesn't stall input delivery.
  2. 0-RTT connection setup: Critical for session resumption / hot-desking scenarios
  3. Pluggable congestion control: Can use algorithms optimized for low-latency interactive traffic (e.g., BBR, COPA)
  4. Connection migration: Session survives network changes (WiFi -> Ethernet)

QUIC Challenges

  • Firewall blocking: Some corporate networks block UDP, forcing TCP fallback. The fallback penalty is severe (full session teardown + TCP reconnect).
  • Library maturity: QUIC implementations are still maturing. Key libraries:
    • quinn (Rust): Well-maintained, async, good for our use case
    • quiche (Cloudflare, Rust/C): Production-tested
    • s2n-quic (AWS, Rust): High performance
  • CPU overhead: QUIC's encryption and user-space processing can be higher than kernel TCP

Media over QUIC (MoQ)

MoQ is an emerging IETF standard (RFC expected 2026) that combines:

  • Low-latency interactivity of WebRTC
  • Scalability of HLS/DASH
  • Built on QUIC/WebTransport

Architecture: Publish-subscribe model with tracks, groups, and objects. Sub-250ms latency target.

Relevance: MoQ's concepts (prioritized streams, partial reliability, adaptive quality) are directly applicable to remote display, though the protocol itself is focused on media distribution rather than interactive desktop.

Implementations: Cloudflare has deployed MoQ relays on their global network. OpenMOQ consortium (Akamai, Cisco, YouTube, etc.) developing open source implementations.

Adaptive Bitrate for Remote Display

Key strategies:

  • Bandwidth estimation: Measure RTT and throughput continuously
  • Quality adjustment: Change encoder bitrate, resolution, or frame rate
  • Frame dropping: Under extreme congestion, drop non-reference frames
  • Temporal scalability (SVC): Encode with multiple temporal layers, drop higher layers under congestion
  • Resolution scaling: Encode at lower resolution and upscale on client (works well with modern upscaling algorithms)

Latency Budget

For interactive remote desktop, the target end-to-end latency budget:

Stage Target
Capture <1ms (DMA-BUF)
Encode 1-5ms (hardware)
Network (LAN) <1ms
Network (WAN) 10-100ms
Decode 1-3ms (hardware)
Render <1ms
Total (LAN) <10ms
Total (WAN) 15-110ms

Key Design Insights for WayRay

  • Use QUIC as primary transport with TCP fallback
  • Rust has excellent QUIC libraries (quinn)
  • Separate QUIC streams for display, input, audio, USB
  • Input should be highest priority (lowest latency)
  • Implement adaptive bitrate from the start
  • Consider SVC temporal layers in the encoder for graceful degradation

Sources


8. Framebuffer Capture Techniques

DMA-BUF Export (Zero-Copy)

DMA-BUF is the Linux kernel subsystem for sharing buffers between devices (GPU, display, video encoder).

How it works:

  1. GPU renders frame into a DMA-BUF object (fd-backed GPU memory)
  2. The fd is passed to the consumer (encoder, another GPU, etc.)
  3. No CPU copy occurs; the buffer stays in GPU memory

For a Wayland compositor acting as a thin client server:

[Wayland clients] --> [Compositor renders to GPU buffer]
                           |
                    [DMA-BUF export (fd)]
                           |
                    [VAAPI encoder imports fd]
                           |
                    [Encoded bitstream -> network]

Key protocols:

  • linux-dmabuf-v1: Clients use this to submit GPU-rendered buffers to the compositor
  • ext-image-copy-capture-v1: Captures compositor output as DMA-BUF
  • DMA-BUF feedback (v4): Tells clients which GPU/format the compositor prefers

GPU Readback (Fallback)

When DMA-BUF export is not possible:

  1. Compositor renders to GPU texture
  2. glReadPixels() or equivalent copies pixels to CPU memory
  3. CPU memory is then compressed/encoded

This is significantly slower due to the GPU -> CPU copy and pipeline stall, but universally supported.

Damage Tracking

Damage tracking identifies which regions of the screen changed between frames, avoiding retransmission of unchanged areas.

Wayland's built-in damage tracking:

  • Each wl_surface.commit() includes damage rectangles via wl_surface.damage() or wl_surface.damage_buffer()
  • The compositor knows exactly which surface regions changed

Compositor-level damage:

  • The compositor tracks which regions of the output changed (due to surface damage, window moves, overlapping windows, etc.)
  • ext-image-copy-capture-v1 supports damage reporting: the compositor tells the capturer which regions changed since the last frame

For encoding efficiency:

  • With H.264/H.265/AV1: damage regions inform the encoder which macroblocks to mark as changed
  • With lossless compression: only changed regions need to be compressed and sent
  • With hybrid approach: unchanged regions get zero bits, changed regions get full encoding

wl-screenrec: Reference Implementation

wl-screenrec is a Rust project demonstrating high-performance Wayland screen recording:

  • Uses wlr-screencopy with DMA-BUF
  • Hardware encoding via VAAPI
  • Zero-copy pipeline (DMA-BUF -> VAAPI -> file)
  • Written in Rust, good reference for our implementation

Key Design Insights for WayRay

  • Own the compositor: By building/extending a Wayland compositor, we have direct access to all rendering state, damage information, and DMA-BUF handles
  • DMA-BUF -> VAAPI is the critical path: This zero-copy pipeline should be the primary encoding path
  • Damage tracking reduces encoding work: Use Wayland's built-in damage tracking to minimize what gets encoded
  • Fallback to GPU readback for unsupported hardware
  • wl-screenrec is a good Rust reference for the capture -> encode pipeline

Sources


9. Audio Forwarding

PipeWire Network Audio

PipeWire provides several mechanisms for network audio:

module-rtp-sink: Creates a PipeWire sink that sends audio as RTP packets

  • Supports raw PCM, Opus encoding
  • Configurable latency via sess.latency.msec (default: 100ms for network)
  • Uses SAP/mDNS for discovery

module-rtp-source: Creates a PipeWire source that receives RTP packets

  • DLL-based clock recovery to handle network jitter
  • Configurable ring buffer fill level

module-rtp-session: Combined send/receive with automatic discovery

  • Uses Apple MIDI protocol for low-latency bidirectional MIDI
  • Announced via Avahi/mDNS/Bonjour

Pulse Tunnel Module

module-pulse-tunnel: Tunnels audio to/from a remote PulseAudio/PipeWire-Pulse server

  • Simpler setup, works over TCP
  • Higher latency than RTP approach
  • Good for compatibility with existing PulseAudio setups

Low-Latency Audio Considerations

For remote desktop audio, the targets are:

Parameter Target
Codec Opus (designed for low latency)
Frame size 2.5ms - 10ms (Opus supports down to 2.5ms)
Buffer/Quantum As low as 128 samples @ 48kHz (~2.67ms)
Network jitter buffer 10-30ms
Total one-way latency 15-50ms

Opus codec advantages:

  • Designed for both speech and music
  • 2.5ms to 60ms frame sizes
  • 6 kbps to 510 kbps bitrate range
  • Built-in forward error correction (FEC)
  • Packet loss concealment (PLC)

Custom Audio Pipeline for Thin Client

For a purpose-built thin client, the audio pipeline should be:

[Server PipeWire] -> [Opus encode] -> [RTP/QUIC] -> [Opus decode] -> [Client audio output]
[Client microphone] -> [Opus encode] -> [RTP/QUIC] -> [Opus decode] -> [Server PipeWire]

Key considerations:

  • Clock synchronization: Client and server audio clocks will drift. Need adaptive resampling or buffer management.
  • Jitter compensation: Network jitter requires a playout buffer. Adaptive jitter buffer adjusts to network conditions.
  • Echo cancellation: If microphone and speakers are on the same client device, need AEC.

Key Design Insights for WayRay

  • Opus over QUIC is the right approach for a custom thin client
  • PipeWire's RTP module is a good starting point but we may want tighter integration
  • Clock drift compensation is critical for long-running sessions
  • Audio and video synchronization (lip sync) must be maintained
  • Forward error correction helps with packet loss without retransmission latency

Sources


10. USB/IP

Architecture

USB/IP is a Linux kernel subsystem that shares USB devices over TCP/IP networks.

Components:

Component Side Role
usbip-core Both Shared protocol and utility code
vhci-hcd Client Virtual Host Controller Interface - presents virtual USB ports to the local USB stack
usbip-host (stub) Server Binds to physical USB devices, encapsulates URBs for network transmission
usbip-vudc Server Virtual USB Device Controller, for USB Gadget-based virtual devices

Protocol

Discovery: Client sends OP_REQ_DEVLIST over TCP, server responds with OP_REP_DEVLIST listing exportable devices.

Attachment: Client sends OP_REQ_IMPORT, server responds with OP_REP_IMPORT and begins forwarding URBs.

Data transfer: USB Request Blocks (URBs) are encapsulated in TCP packets and forwarded between stub driver and VHCI. The device driver runs entirely on the client side.

Port: TCP 3240 (default)

Protocol Flow

[USB Device] <-> [Stub Driver (server kernel)]
                      |
                 [TCP/IP Network]
                      |
                [VHCI Driver (client kernel)]
                      |
             [USB Device Driver (client)]
                      |
             [Application (client)]

Kernel Integration

  • Merged into mainline Linux since kernel 3.17
  • Source: drivers/usb/usbip/ and tools/usb/usbip/
  • Supports USB 2.0 and USB 3.0 devices
  • Windows support via usbip-win

Limitations

  • Latency: TCP round-trip for every URB can add significant latency for isochronous devices (audio, video)
  • Bandwidth: USB 3.0 bulk transfers work well, but sustained high-bandwidth is limited by network
  • Isochronous transfers: Not well supported (real-time USB audio/video devices may not work)
  • Security: No built-in encryption (must tunnel through SSH/VPN)

Alternatives: SPICE usbredir

SPICE's USB redirection (usbredir) is an alternative approach:

  • Library: libusbredir
  • Works at the USB protocol level (like USB/IP)
  • Better integration with SPICE's authentication/encryption
  • Can be used independently of SPICE

Key Design Insights for WayRay

  • USB/IP is mature and kernel-integrated - good baseline
  • For a thin client, wrapping USB/IP over QUIC (instead of raw TCP) would add encryption and better congestion handling
  • usbredir is worth considering as it's designed for remote desktop use cases
  • Isochronous USB devices (webcams, audio interfaces) are challenging over network and may need special handling
  • Consider selective USB forwarding - only forward devices the user explicitly shares

Sources


11. Modern Thin Client Projects

Sun Ray (Historical Reference)

The original Sun Ray (1999-2014) is the gold standard for thin client architecture:

  • Protocol: Appliance Link Protocol (ALP) over UDP/IP
  • Architecture: Completely stateless DTUs (Desktop Terminal Units) with zero local storage/OS
  • Session model: Sessions are independent of physical hardware. Pull your smartcard, insert at another Sun Ray, session follows instantly ("hot desking")
  • Server: Sun Ray Server Software (SRSS) managed sessions, ran on Solaris/Linux
  • Network: Standard switched Ethernet, DHCP-based configuration
  • Security: SSL/TLS encryption with 128-bit ARCFOUR
  • Display: Rendered entirely on server, compressed framebuffer sent to DTU

Key Sun Ray concepts to replicate:

  • Instant session mobility (smartcard/badge driven)
  • Zero client-side state
  • Centralized session management
  • Simple, robust network boot

Wafer (Wayland-Based Thin Client)

Wafer is the most directly comparable modern project:

  • Goal: Thin client for Linux server + Linux clients over high-speed LAN
  • Protocol: Wayland protocol over network
  • Server ("Mainframe"): Multi-core machine with GBM-capable GPU
  • Design: Full 3D acceleration on server, minimal CPU on client (Raspberry Pi target)
  • Status: Proof of concept / early development

Sunshine + Moonlight

Sunshine (server) + Moonlight (client) is the most mature open-source streaming solution:

  • Protocol: Based on NVIDIA GameStream protocol
  • Encoding: H.264, H.265, AV1 with NVENC, VAAPI, AMF hardware encoding
  • Performance: Sub-10ms latency on LAN, up to 120 FPS
  • Clients: Android, iOS, PC, Mac, Raspberry Pi, Steam Deck, Nintendo Switch, LG webOS
  • Audio: Full audio streaming with multi-channel support
  • Input: Mouse, keyboard, gamepad, touchscreen
  • Limitations: Designed for single-user gaming, not multi-user thin client

WayVNC

WayVNC is a VNC server for wlroots-based Wayland compositors:

  • Implements RFB protocol over wlr-screencopy / ext-image-copy-capture
  • Supports headless mode (no physical display)
  • Authentication: PAM, TLS (VeNCrypt), RSA-AES
  • Input: Virtual pointer and keyboard via Wayland protocols
  • JSON-IPC for runtime control
  • Good reference for Wayland compositor integration

GNOME Remote Desktop

GNOME's built-in remote desktop solution:

  • Speaks RDP (primary) and VNC
  • Uses PipeWire for screen capture via Mutter's ScreenCast D-Bus API
  • Supports headless multi-user sessions (GNOME 46+)
  • Input forwarding via Mutter's RemoteDesktop D-Bus API
  • Integrated with GDM for remote login
  • Active development, improving rapidly

ThinStation

ThinStation is a framework for building thin client Linux images:

  • Supports Citrix ICA, SPICE, NX, RDP, VMware Horizon
  • Boots from network (PXE), USB, or compact flash
  • Not a protocol itself, but a client OS/distribution

openthinclient

openthinclient is a commercial open-source thin client management platform:

  • Based on Debian (latest: Debian 13 "Trixie")
  • Manages thin client fleet, user sessions, applications
  • Supports multiple VDI protocols
  • Version 2603 (2025) includes updated VDI components

Key Design Insights for WayRay

  • Sunshine/Moonlight proves that low-latency game streaming is solved; adapt for desktop
  • WayVNC shows how to integrate with wlroots compositors
  • GNOME Remote Desktop shows the PipeWire + portal approach
  • Wafer validates the concept but is early-stage
  • Sun Ray's session mobility is the killer feature to replicate
  • No existing project combines: Wayland-native + multi-user + session mobility + hardware encoding + QUIC transport

Sources


12. Architecture Recommendations for a SunRay-Like System

Based on all the research above, here is a synthesized architectural recommendation:

Core Architecture

                          ┌─────────────────────────────────────────────┐
                          │              WayRay Server                  │
                          │                                             │
                          │  ┌─────────────────────────────────┐       │
                          │  │   Wayland Compositor (wlroots)  │       │
                          │  │   - Per-user session             │       │
                          │  │   - DMA-BUF output              │       │
                          │  │   - Damage tracking              │       │
                          │  └──────────┬──────────────────────┘       │
                          │             │ DMA-BUF (zero-copy)           │
                          │  ┌──────────▼──────────────────────┐       │
                          │  │   Encoder Pipeline               │       │
                          │  │   - VAAPI H.264/H.265/AV1       │       │
                          │  │   - Damage-aware encoding        │       │
                          │  │   - Adaptive bitrate             │       │
                          │  └──────────┬──────────────────────┘       │
                          │             │ Encoded frames                │
                          │  ┌──────────▼──────────────────────┐       │
                          │  │   Session Manager                │       │
                          │  │   - Multi-user sessions          │       │
                          │  │   - Session migration            │       │
                          │  │   - Authentication               │       │
                          │  └──────────┬──────────────────────┘       │
                          │             │                               │
                          │  ┌──────────▼──────────────────────┐       │
                          │  │   QUIC Transport                 │       │
                          │  │   - Display stream (video)       │       │
                          │  │   - Input stream (low-latency)   │       │
                          │  │   - Audio stream (Opus/RTP)      │       │
                          │  │   - USB stream (usbredir)        │       │
                          │  │   - Control stream               │       │
                          │  └──────────┬──────────────────────┘       │
                          └─────────────┼───────────────────────────────┘
                                        │ QUIC / Network
                          ┌─────────────┼───────────────────────────────┐
                          │             │                               │
                          │  ┌──────────▼──────────────────────┐       │
                          │  │   QUIC Transport                 │       │
                          │  └──────────┬──────────────────────┘       │
                          │             │                               │
                          │  ┌──────────▼──────────────────────┐       │
                          │  │   Decoder (VAAPI/SW)             │       │
                          │  │   + Audio (Opus decode)          │       │
                          │  │   + Input capture                │       │
                          │  │   + USB forwarding               │       │
                          │  └──────────┬──────────────────────┘       │
                          │             │                               │
                          │  ┌──────────▼──────────────────────┐       │
                          │  │   Minimal Wayland Compositor     │       │
                          │  │   (or direct DRM/KMS output)     │       │
                          │  └─────────────────────────────────┘       │
                          │              WayRay Client                  │
                          └─────────────────────────────────────────────┘

Technology Stack Recommendations

Component Recommended Technology Rationale
Server compositor wlroots-based custom compositor Direct access to DMA-BUF, damage tracking, input injection
Capture Direct compositor integration (no protocol needed) Lowest latency, full damage info
Encoding VAAPI (primary), NVENC (optional) via FFmpeg/GStreamer Cross-vendor, zero-copy from DMA-BUF
Video codec H.264 (default), AV1 (preferred when supported) H.264 for compatibility, AV1 for quality/bandwidth
Transport QUIC (quinn crate) with TCP fallback Low latency, multiplexing, 0-RTT
Audio Opus over QUIC stream Low latency, built-in FEC
USB usbredir over QUIC stream Designed for remote desktop
Session management Custom (inspired by Sun Ray SRSS) Session mobility, multi-user
Client display DRM/KMS direct or minimal Wayland compositor Minimal overhead
Language Rust Safety, performance, excellent ecosystem (smithay, quinn, etc.)

QUIC Stream Layout

Stream ID Type Priority Reliability Content
0 Bidirectional Highest Reliable Control/session management
1 Server -> Client High Unreliable Video frames
2 Client -> Server Highest Reliable Input events
3 Server -> Client Medium Reliable Audio playback
4 Client -> Server Medium Reliable Audio capture
5 Bidirectional Low Reliable USB/IP data
6 Bidirectional Medium Reliable Clipboard

Encoding Strategy

  1. Damage detection: Compositor reports damaged regions per frame
  2. Content classification: Heuristically detect video regions vs. desktop content (like SPICE does)
  3. Encoding decision:
    • Small damage, text/UI: Lossless (zstd-compressed) tile updates
    • Large damage, desktop: H.264/AV1 with high quality, low bitrate
    • Video regions: H.264/AV1 with lower quality, higher frame rate
    • Full screen video: Full-frame H.264/AV1 encoding
  4. Adaptive quality: Adjust based on measured bandwidth and latency

Sun Ray Features to Implement

  1. Session mobility: Associate sessions with authentication tokens, not hardware. Insert token at any client -> session follows.
  2. Stateless clients: Client boots from network, has no persistent state.
  3. Centralized management: Server manages all sessions, client configurations, authentication.
  4. Hot desking: Disconnect from one client, connect at another, session is exactly where you left it.
  5. Multi-monitor: Support multiple displays per session.