mirror of
https://github.com/CloudNebulaProject/wayray.git
synced 2026-04-10 21:20:40 +00:00
995 lines
48 KiB
Markdown
995 lines
48 KiB
Markdown
|
|
# Remote Display & Thin Client Technologies Research
|
||
|
|
|
||
|
|
Comprehensive research for building a SunRay-like thin client system. Covers protocols, capture mechanisms, encoding, networking, and audio/USB forwarding.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Table of Contents
|
||
|
|
|
||
|
|
1. [SPICE Protocol](#1-spice-protocol)
|
||
|
|
2. [RDP (Remote Desktop Protocol)](#2-rdp-remote-desktop-protocol)
|
||
|
|
3. [VNC / RFB Protocol](#3-vnc--rfb-protocol)
|
||
|
|
4. [Waypipe](#4-waypipe)
|
||
|
|
5. [PipeWire Screen Capture](#5-pipewire-screen-capture)
|
||
|
|
6. [Video Codecs for Remote Display](#6-video-codecs-for-remote-display)
|
||
|
|
7. [Network Protocols](#7-network-protocols)
|
||
|
|
8. [Framebuffer Capture Techniques](#8-framebuffer-capture-techniques)
|
||
|
|
9. [Audio Forwarding](#9-audio-forwarding)
|
||
|
|
10. [USB/IP](#10-usbip)
|
||
|
|
11. [Modern Thin Client Projects](#11-modern-thin-client-projects)
|
||
|
|
12. [Architecture Recommendations](#12-architecture-recommendations-for-a-sunray-like-system)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 1. SPICE Protocol
|
||
|
|
|
||
|
|
**SPICE** (Simple Protocol for Independent Computing Environments) is a remote display protocol originally developed by Qumranet (acquired by Red Hat). It is the most architecturally relevant existing protocol for a SunRay-like system.
|
||
|
|
|
||
|
|
### Architecture
|
||
|
|
|
||
|
|
SPICE has a four-component architecture:
|
||
|
|
- **Protocol**: Wire format specification for all messages
|
||
|
|
- **Server** (`libspice-server`): Runs inside the hypervisor/host, directly accesses the virtual GPU framebuffer
|
||
|
|
- **Client** (`spice-gtk`, `remote-viewer`): Renders display, captures input, handles USB/audio
|
||
|
|
- **Guest Agent** (`spice-vdagent`): Runs inside the guest VM for clipboard, resolution changes, file transfer
|
||
|
|
|
||
|
|
### Channel Architecture
|
||
|
|
|
||
|
|
Each SPICE session consists of **multiple independent TCP/TLS connections**, one per channel type:
|
||
|
|
|
||
|
|
| Channel | ID | Purpose |
|
||
|
|
|---|---|---|
|
||
|
|
| **Main** | 1 | Session management, migration, agent communication |
|
||
|
|
| **Display** | 2 | Rendering commands, images, video streams |
|
||
|
|
| **Inputs** | 3 | Keyboard and mouse events |
|
||
|
|
| **Cursor** | 4 | Pointer shape and position |
|
||
|
|
| **Playback** | 5 | Audio output (server -> client) |
|
||
|
|
| **Record** | 6 | Audio input (client -> server) |
|
||
|
|
| **Smartcard** | 8 | Smartcard passthrough |
|
||
|
|
| **USB Redir** | 9 | USB device forwarding via usbredir |
|
||
|
|
| **Port** | 10 | Generic data port |
|
||
|
|
| **Webdav** | 11 | File sharing via WebDAV |
|
||
|
|
|
||
|
|
### Display Channel & Image Compression
|
||
|
|
|
||
|
|
The display channel is the most complex. SPICE does **not** just send raw framebuffer pixels. Instead it sends **rendering commands** (draw operations, images, etc.) and tries to **offload rendering to the client GPU**.
|
||
|
|
|
||
|
|
**Image compression algorithms** (selectable at runtime):
|
||
|
|
- **Quic**: Proprietary algorithm based on SFALIC. Optimized for photographic/natural images
|
||
|
|
- **LZ**: Standard Lempel-Ziv. Good for text/UI content
|
||
|
|
- **GLZ** (Global LZ): LZ with a **history-based global dictionary** that exploits repeating patterns across images. Critical for WAN performance
|
||
|
|
- **Auto mode**: Heuristically selects Quic vs. LZ/GLZ per-image based on content type
|
||
|
|
|
||
|
|
**Video streaming**: The server **heuristically detects video regions** (rapidly changing rectangular areas) and encodes them as **M-JPEG streams**, dramatically reducing bandwidth for video playback.
|
||
|
|
|
||
|
|
**Caching**: Images, palettes, and cursor data are cached on the client side to avoid retransmission.
|
||
|
|
|
||
|
|
### Key Design Insights for WayRay
|
||
|
|
|
||
|
|
- The multi-channel approach allows independent QoS per data type
|
||
|
|
- Sending rendering commands rather than raw pixels is more bandwidth-efficient
|
||
|
|
- The automatic image compression selection based on content type is clever
|
||
|
|
- GLZ's global dictionary approach is excellent for WAN scenarios
|
||
|
|
- Video region detection and switching to video codec is a critical optimization
|
||
|
|
|
||
|
|
### Sources
|
||
|
|
- [SPICE Protocol Specification](https://www.spice-space.org/spice-protocol.html)
|
||
|
|
- [SPICE for Newbies](https://www.spice-space.org/spice-for-newbies.html)
|
||
|
|
- [SPICE Features](https://www.spice-space.org/features.html)
|
||
|
|
- [SPICE User Manual](https://www.spice-space.org/spice-user-manual.html)
|
||
|
|
- [SPICE Protocol PDF](https://www.spice-space.org/static/docs/spice_protocol.pdf)
|
||
|
|
- [SPICE Wikipedia](https://en.wikipedia.org/wiki/Simple_Protocol_for_Independent_Computing_Environments)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 2. RDP (Remote Desktop Protocol)
|
||
|
|
|
||
|
|
**RDP** is Microsoft's proprietary remote desktop protocol, based on the ITU T.120 family of protocols. Default port: TCP/UDP 3389.
|
||
|
|
|
||
|
|
### Architecture
|
||
|
|
|
||
|
|
RDP uses a **client-server model** with a layered architecture:
|
||
|
|
|
||
|
|
1. **Transport Layer**: TCP/IP (traditional) or UDP (for lossy/real-time data)
|
||
|
|
2. **Security Layer**: TLS/NLA (Network Level Authentication)
|
||
|
|
3. **Core Protocol**: PDU (Protocol Data Unit) processing, state machine
|
||
|
|
4. **Virtual Channel System**: Extensible channel framework for features
|
||
|
|
|
||
|
|
**Server-side components**:
|
||
|
|
- `Wdtshare.sys`: RDP driver handling UI transfer, compression, encryption, framing
|
||
|
|
- `Tdtcp.sys`: Transport driver packaging the protocol onto TCP/IP
|
||
|
|
|
||
|
|
### Virtual Channel System
|
||
|
|
|
||
|
|
RDP's extensibility comes from its virtual channel architecture:
|
||
|
|
|
||
|
|
**Static Virtual Channels (SVC)**:
|
||
|
|
- Negotiated during connection setup
|
||
|
|
- Fixed for session lifetime
|
||
|
|
- Name limited to 8 bytes
|
||
|
|
- Examples: `RDPSND` (audio), `CLIPRDR` (clipboard), `RDPDR` (device redirection)
|
||
|
|
|
||
|
|
**Dynamic Virtual Channels (DVC)**:
|
||
|
|
- Built on top of the `DRDYNVC` static channel
|
||
|
|
- Can be opened/closed during a session
|
||
|
|
- Used for modern features: graphics pipeline, USB redirection, diagnostics
|
||
|
|
- Microsoft's recommended approach for new development
|
||
|
|
|
||
|
|
### Graphics Pipeline
|
||
|
|
|
||
|
|
RDP has evolved through several graphics approaches:
|
||
|
|
|
||
|
|
1. **GDI Remoting** (original): Send Windows GDI drawing commands
|
||
|
|
2. **RemoteFX Codec**: Wavelet-based (DWT + RLGR encoding), supports lossless and lossy modes
|
||
|
|
3. **RemoteFX Progressive Codec**: Progressive rendering for WAN - sends low quality first, refines incrementally
|
||
|
|
4. **GFX Pipeline** (`MS-RDPEGFX`): Modern graphics extension supporting:
|
||
|
|
- AVC/H.264 encoding for video content
|
||
|
|
- RemoteFX for non-video content
|
||
|
|
- Adaptive selection based on content type and bandwidth
|
||
|
|
|
||
|
|
**Note**: RemoteFX vGPU was deprecated in 2020 due to security vulnerabilities; the codec itself lives on in the GFX pipeline.
|
||
|
|
|
||
|
|
### FreeRDP
|
||
|
|
|
||
|
|
[FreeRDP](https://github.com/FreeRDP/FreeRDP) is the dominant open-source RDP implementation (Apache 2.0 license):
|
||
|
|
- Written primarily in C (87.8%)
|
||
|
|
- Clean separation: `libfreerdp` (protocol) vs. client frontends vs. server implementations
|
||
|
|
- Powers Remmina, GNOME Connections, KRDC, and most Linux RDP clients
|
||
|
|
- Implements the full virtual channel system including GFX pipeline
|
||
|
|
|
||
|
|
### Key Design Insights for WayRay
|
||
|
|
|
||
|
|
- The SVC/DVC split is instructive: start with fixed channels, add dynamic ones later
|
||
|
|
- Progressive rendering is excellent for variable-bandwidth scenarios
|
||
|
|
- Content-adaptive encoding (H.264 for video, wavelet for desktop) is the modern approach
|
||
|
|
- FreeRDP's architecture (protocol library separate from client/server) is a good model
|
||
|
|
|
||
|
|
### Sources
|
||
|
|
- [Understanding RDP - Microsoft Learn](https://learn.microsoft.com/en-us/troubleshoot/windows-server/remote/understanding-remote-desktop-protocol)
|
||
|
|
- [RDP Wikipedia](https://en.wikipedia.org/wiki/Remote_Desktop_Protocol)
|
||
|
|
- [FreeRDP GitHub](https://github.com/FreeRDP/FreeRDP)
|
||
|
|
- [MS-RDPEGFX Specification](https://learn.microsoft.com/en-us/openspecs/windows_protocols/ms-rdpegfx/da5c75f9-cd99-450c-98c4-014a496942b0)
|
||
|
|
- [Graphics Encoding over RDP - Azure](https://learn.microsoft.com/en-us/azure/virtual-desktop/graphics-encoding)
|
||
|
|
- [RDP Virtual Channels - Microsoft Learn](https://learn.microsoft.com/en-us/windows/win32/termserv/terminal-services-virtual-channels)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 3. VNC / RFB Protocol
|
||
|
|
|
||
|
|
**VNC** (Virtual Network Computing) uses the **RFB** (Remote Framebuffer) protocol, standardized in [RFC 6143](https://www.rfc-editor.org/rfc/rfc6143.html).
|
||
|
|
|
||
|
|
### Architecture
|
||
|
|
|
||
|
|
RFB is a **simple, stateless framebuffer protocol**. The fundamental design:
|
||
|
|
|
||
|
|
- The display side is based on a single primitive: **"put a rectangle of pixel data at position (x, y)"**
|
||
|
|
- A sequence of rectangles makes a **framebuffer update**
|
||
|
|
- The protocol is **client-pull**: the client requests updates, the server sends them
|
||
|
|
- **Pixel format** is negotiated: 24-bit true color, 16-bit, or 8-bit color-mapped
|
||
|
|
|
||
|
|
### Encoding Types
|
||
|
|
|
||
|
|
The encoding system is the key to VNC performance. Different encodings trade off bandwidth, client CPU, and server CPU:
|
||
|
|
|
||
|
|
| Encoding | Description | Best For |
|
||
|
|
|---|---|---|
|
||
|
|
| **Raw** | Uncompressed pixel data, scanline order | Fast LAN, low CPU |
|
||
|
|
| **CopyRect** | Reference to existing framebuffer region | Window moves, scrolling |
|
||
|
|
| **RRE** | Rise-and-Run-length Encoding, rectangles of solid color | Simple UIs |
|
||
|
|
| **Hextile** | 16x16 tile subdivision with RRE within tiles | Fast LAN (low CPU overhead) |
|
||
|
|
| **Zlib** | Raw data compressed with zlib | Moderate bandwidth savings |
|
||
|
|
| **Tight** | Intelligent per-rectangle compression selection (zlib, JPEG, indexed color, solid) | Low bandwidth / WAN |
|
||
|
|
| **ZRLE** | Zlib Run-Length Encoding, combines zlib with palette/RLE | Good all-around |
|
||
|
|
| **TurboVNC/Tight+JPEG** | Tight with aggressive JPEG for photographic regions | Video content, high FPS |
|
||
|
|
|
||
|
|
**Pseudo-encodings** allow clients to advertise extension support (cursor shape, desktop resize, etc.) without changing the core protocol.
|
||
|
|
|
||
|
|
### Performance Characteristics
|
||
|
|
|
||
|
|
- **Fast LAN**: Hextile or Raw (minimize CPU overhead)
|
||
|
|
- **WAN/Low bandwidth**: Tight (best compression ratios, especially for mixed content)
|
||
|
|
- **Photo/Video content**: Tight with JPEG (TurboVNC achieves 4x better performance than ZRLE for images)
|
||
|
|
- **Scrolling/Window moves**: CopyRect (near-zero bandwidth)
|
||
|
|
|
||
|
|
### Key Design Insights for WayRay
|
||
|
|
|
||
|
|
- CopyRect-style "reference previous frame data" is extremely efficient for common desktop operations
|
||
|
|
- Per-rectangle encoding selection (as in Tight) is superior to one-size-fits-all
|
||
|
|
- RFB's simplicity is both its strength (easy to implement) and weakness (no audio, USB, etc.)
|
||
|
|
- The client-pull model introduces latency; a push model with damage tracking is better
|
||
|
|
|
||
|
|
### Sources
|
||
|
|
- [RFC 6143 - The Remote Framebuffer Protocol](https://www.rfc-editor.org/rfc/rfc6143.html)
|
||
|
|
- [RFB Protocol Documentation](https://vncdotool.readthedocs.io/en/0.8.0/rfbproto.html)
|
||
|
|
- [RFB Protocol Wikipedia](https://en.wikipedia.org/wiki/RFB_protocol)
|
||
|
|
- [VNC Tight Encoder Comparison](https://www.tightvnc.com/archive/compare.html)
|
||
|
|
- [TigerVNC RFB Protocol](https://github.com/svn2github/tigervnc/blob/master/rfbproto/rfbproto.rst)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 4. Waypipe
|
||
|
|
|
||
|
|
**Waypipe** is a proxy for Wayland clients, analogous to `ssh -X` for X11. It is the most directly relevant existing project for Wayland remote display.
|
||
|
|
|
||
|
|
### Architecture
|
||
|
|
|
||
|
|
Waypipe operates as a **paired proxy** system:
|
||
|
|
|
||
|
|
```
|
||
|
|
[Remote App] <--Wayland--> [waypipe server] <--socket/SSH--> [waypipe client] <--Wayland--> [Local Compositor]
|
||
|
|
```
|
||
|
|
|
||
|
|
- **Server mode**: Acts as a Wayland compositor stub on the remote side. Wayland apps connect to it as if it were a real compositor.
|
||
|
|
- **Client mode**: Connects to the local real compositor and forwards surface updates from the remote side.
|
||
|
|
- **SSH integration**: `waypipe ssh user@host app` sets up the tunnel automatically.
|
||
|
|
|
||
|
|
### Buffer Synchronization
|
||
|
|
|
||
|
|
This is the key technical innovation:
|
||
|
|
|
||
|
|
1. Waypipe keeps a **mirror copy** of each shared memory buffer
|
||
|
|
2. When a buffer is committed, waypipe **diffs** the current buffer against the mirror
|
||
|
|
3. Only **changed regions** are transmitted
|
||
|
|
4. The remote side applies the diff to reconstruct the buffer
|
||
|
|
|
||
|
|
### Compression Options
|
||
|
|
|
||
|
|
| Method | Use Case | Default |
|
||
|
|
|---|---|---|
|
||
|
|
| **none** | High-bandwidth LAN | No |
|
||
|
|
| **lz4** | General purpose, fast | Yes (default) |
|
||
|
|
| **zstd** | Low-bandwidth / WAN | No |
|
||
|
|
|
||
|
|
Compression ratios: 30x for text-heavy content, down to 1.5x for noisy images.
|
||
|
|
|
||
|
|
### Video Encoding (DMA-BUF)
|
||
|
|
|
||
|
|
For DMA-BUF buffers (GPU-rendered content), waypipe supports **lossy video encoding**:
|
||
|
|
|
||
|
|
- `--video=sw,bpf=120000,h264` (default when `--video` is used)
|
||
|
|
- **Software encoding** (libx264) or **hardware encoding** (VAAPI)
|
||
|
|
- With VAAPI on Intel Gen8 iGPU: **80 FPS at 4 MB/s bandwidth**
|
||
|
|
- Configurable bits-per-frame for quality/bandwidth tradeoff
|
||
|
|
|
||
|
|
### Protocol Handling
|
||
|
|
|
||
|
|
Waypipe parses the Wayland wire protocol, which is **partially self-describing**. It:
|
||
|
|
- Intercepts buffer-related messages (wl_shm, wl_buffer, linux-dmabuf)
|
||
|
|
- Passes through other messages transparently
|
||
|
|
- Is partially forward-compatible with new Wayland protocols
|
||
|
|
|
||
|
|
### Limitations
|
||
|
|
|
||
|
|
- Per-application, not whole-desktop
|
||
|
|
- No built-in audio forwarding
|
||
|
|
- No USB forwarding
|
||
|
|
- Performance depends heavily on application rendering patterns
|
||
|
|
- Latency can be noticeable for interactive use
|
||
|
|
|
||
|
|
### Key Design Insights for WayRay
|
||
|
|
|
||
|
|
- The diff-based buffer synchronization is very efficient for incremental updates
|
||
|
|
- VAAPI video encoding for DMA-BUF is the right approach for GPU-rendered content
|
||
|
|
- Per-application forwarding is limiting; a whole-compositor approach is better for a thin client
|
||
|
|
- The Wayland protocol's design (buffer passing, damage tracking) is well-suited for remote display
|
||
|
|
|
||
|
|
### Sources
|
||
|
|
- [Waypipe GitHub](https://github.com/neonkore/waypipe)
|
||
|
|
- [Waypipe Man Page](https://man.archlinux.org/man/extra/waypipe/waypipe.1.en)
|
||
|
|
- [GSOC 2019 - Waypipe Development Blog](https://mstoeckl.com/notes/gsoc/blog.html)
|
||
|
|
- [Waypipe DeepWiki](https://deepwiki.com/neonkore/waypipe/2-getting-started)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 5. PipeWire Screen Capture
|
||
|
|
|
||
|
|
PipeWire is the modern Linux multimedia framework that unifies audio, video, and screen capture.
|
||
|
|
|
||
|
|
### Portal-Based Screen Capture Architecture
|
||
|
|
|
||
|
|
On Wayland, screen capture follows a **security-first architecture**:
|
||
|
|
|
||
|
|
```
|
||
|
|
[Application] --> [xdg-desktop-portal (D-Bus)] --> [Portal Backend (compositor-specific)]
|
||
|
|
|
|
||
|
|
[PipeWire Stream]
|
||
|
|
|
|
||
|
|
[Application receives frames]
|
||
|
|
```
|
||
|
|
|
||
|
|
**Flow**:
|
||
|
|
1. Application calls `org.freedesktop.portal.ScreenCast.CreateSession()` via D-Bus
|
||
|
|
2. Portal presents a permission dialog to the user
|
||
|
|
3. On approval, `SelectSources()` lets user choose output/window
|
||
|
|
4. `Start()` creates a PipeWire stream and returns a `pipewire_fd`
|
||
|
|
5. Application connects to PipeWire using this fd and receives frames
|
||
|
|
|
||
|
|
### Buffer Sharing Mechanisms
|
||
|
|
|
||
|
|
PipeWire supports two buffer types for screen capture:
|
||
|
|
|
||
|
|
**DMA-BUF (preferred)**:
|
||
|
|
- Zero-copy transfer from compositor GPU memory to consumer
|
||
|
|
- Buffer stays in GPU VRAM throughout the pipeline
|
||
|
|
- Ideal for hardware video encoding (capture -> encode without CPU copy)
|
||
|
|
- Format/modifier negotiation ensures compatibility
|
||
|
|
|
||
|
|
**memfd (fallback)**:
|
||
|
|
- Shared memory file descriptor
|
||
|
|
- Requires CPU copy from GPU to system memory
|
||
|
|
- Universal compatibility but higher overhead
|
||
|
|
|
||
|
|
### Wayland Capture Protocols
|
||
|
|
|
||
|
|
Three generations of capture protocols exist:
|
||
|
|
|
||
|
|
1. **wlr-export-dmabuf-unstable-v1** (legacy): Exports entire output as DMA-BUF frames. Simple but no damage tracking.
|
||
|
|
|
||
|
|
2. **wlr-screencopy-unstable-v1** (deprecated): More flexible, supports shared memory and DMA-BUF. Has damage tracking via `copy_with_damage`. Being replaced.
|
||
|
|
|
||
|
|
3. **ext-image-copy-capture-v1** (current, merged 2024): The new standard protocol:
|
||
|
|
- Client specifies which buffer regions need updating
|
||
|
|
- Compositor only fills changed regions
|
||
|
|
- Supports both output capture and window capture
|
||
|
|
- Initial implementations: wlroots, WayVNC, grim
|
||
|
|
|
||
|
|
### GNOME's Approach
|
||
|
|
|
||
|
|
GNOME/Mutter uses different D-Bus APIs:
|
||
|
|
- `org.gnome.Mutter.ScreenCast`: Provides PipeWire stream of screen content
|
||
|
|
- `org.gnome.Mutter.RemoteDesktop`: Provides input injection
|
||
|
|
- These power `gnome-remote-desktop` which speaks RDP (and VNC)
|
||
|
|
|
||
|
|
### Key Design Insights for WayRay
|
||
|
|
|
||
|
|
- **ext-image-copy-capture-v1 + PipeWire** is the correct modern capture stack
|
||
|
|
- DMA-BUF capture -> hardware encode is the zero-copy golden path
|
||
|
|
- The portal system provides proper security/permission handling
|
||
|
|
- For a thin client server running its own compositor, you can skip the portal and use the capture protocols directly
|
||
|
|
- Damage tracking in ext-image-copy-capture-v1 is essential for efficient updates
|
||
|
|
|
||
|
|
### Sources
|
||
|
|
- [XDG Desktop Portal ScreenCast API](https://flatpak.github.io/xdg-desktop-portal/docs/doc-org.freedesktop.portal.ScreenCast.html)
|
||
|
|
- [ext-image-copy-capture-v1 Protocol](https://wayland.app/protocols/ext-image-copy-capture-v1)
|
||
|
|
- [wlr-screencopy-unstable-v1](https://wayland.app/protocols/wlr-screencopy-unstable-v1)
|
||
|
|
- [wlr-export-dmabuf-unstable-v1](https://wayland.app/protocols/wlr-export-dmabuf-unstable-v1)
|
||
|
|
- [Wayland Merges New Screen Capture Protocols - Phoronix](https://www.phoronix.com/news/Wayland-Merges-Screen-Capture)
|
||
|
|
- [PipeWire ArchWiki](https://wiki.archlinux.org/title/PipeWire)
|
||
|
|
- [Niri Screencasting Implementation](https://deepwiki.com/niri-wm/niri/5.4-screencasting-and-screen-capture)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 6. Video Codecs for Remote Display
|
||
|
|
|
||
|
|
### Codec Comparison for Low-Latency Use
|
||
|
|
|
||
|
|
| Property | H.264/AVC | H.265/HEVC | AV1 |
|
||
|
|
|---|---|---|---|
|
||
|
|
| **Compression efficiency** | Baseline | ~35% better than H.264 | ~50% better than H.264 |
|
||
|
|
| **Encoding latency** | Lowest | Low | Moderate (improving) |
|
||
|
|
| **Hardware encode support** | Universal | Widespread | Newer GPUs only |
|
||
|
|
| **Patent/license** | Licensed (but ubiquitous) | Licensed (complex) | Royalty-free |
|
||
|
|
| **Screen content coding** | Limited | Better | Best (dedicated tools) |
|
||
|
|
| **Decode support** | Universal | Nearly universal | Growing rapidly |
|
||
|
|
| **Best for** | Maximum compatibility | Good quality/bandwidth | Best quality, royalty-free |
|
||
|
|
|
||
|
|
### Low-Latency Encoding Considerations
|
||
|
|
|
||
|
|
For remote desktop, encoding latency is critical. Key settings:
|
||
|
|
|
||
|
|
**Frame structure**:
|
||
|
|
- **No B-frames**: B-frames require future frames, adding latency
|
||
|
|
- **No lookahead**: Lookahead improves quality but adds latency
|
||
|
|
- **No frame reordering**: Frames must be encoded/decoded in order
|
||
|
|
- **Single slice / low-delay profile**: Minimizes buffering
|
||
|
|
|
||
|
|
**Rate control**:
|
||
|
|
- **CBR (Constant Bit Rate)**: Keeps network queues short and predictable
|
||
|
|
- **VBR with max bitrate cap**: Better quality but can cause bandwidth spikes
|
||
|
|
- CBR is generally preferred for remote desktop due to predictable latency
|
||
|
|
|
||
|
|
**Intra refresh**:
|
||
|
|
- Periodic I-frames are large and cause bandwidth spikes
|
||
|
|
- **Gradual Intra Refresh (GIR)**: Spreads intra-coded blocks across frames, avoiding spikes
|
||
|
|
- Essential for smooth, low-latency streaming
|
||
|
|
|
||
|
|
### AV1 Specific Advantages
|
||
|
|
|
||
|
|
AV1 has features specifically useful for remote desktop:
|
||
|
|
- **Screen Content Coding (SCC)**: Dedicated tools for text, UI elements, and screen captures that dramatically reduce bitrate
|
||
|
|
- **Temporal Scalability (SVC)**: L1T2 mode (1 spatial layer, 2 temporal layers) allows dropping frames gracefully under bandwidth pressure
|
||
|
|
- **Film Grain Synthesis**: Can transmit film grain parameters instead of actual grain, saving bandwidth
|
||
|
|
|
||
|
|
Chrome's libaom AV1 encoder (speed 10): 12% better quality than VP9 at same bandwidth, 25% faster encoding.
|
||
|
|
|
||
|
|
### Hardware Encoding
|
||
|
|
|
||
|
|
#### NVIDIA NVENC
|
||
|
|
|
||
|
|
- Available on GeForce GTX 600+ and all Quadro/Tesla with Kepler+
|
||
|
|
- **Video Codec SDK v13.0** (2025): AV1 ultra-high quality mode, comparable to software AV1 encoding
|
||
|
|
- Latency modes:
|
||
|
|
- **Normal Latency**: Default, uses B-frames and lookahead
|
||
|
|
- **Low Latency**: No B-frames, no reordering
|
||
|
|
- **Ultra Low Latency**: Strict in-order pipeline, minimal frame queuing
|
||
|
|
- Dedicated hardware encoder block (does not consume CUDA cores)
|
||
|
|
- Can encode 4K@120fps with sub-frame latency
|
||
|
|
|
||
|
|
#### Intel VAAPI (Video Acceleration API)
|
||
|
|
|
||
|
|
- Open-source API (`libva`) supported on Intel Gen8+ (Broadwell+)
|
||
|
|
- Supports H.264, H.265, AV1 (Intel Arc/Gen12+), VP9
|
||
|
|
- FFmpeg integration: `h264_vaapi`, `hevc_vaapi`, `av1_vaapi`
|
||
|
|
- Low-power encoding mode available on some platforms
|
||
|
|
- GStreamer integration via `gstreamer-vaapi`
|
||
|
|
- Well-suited for always-on server scenarios (low power consumption)
|
||
|
|
|
||
|
|
#### AMD AMF/VCN
|
||
|
|
|
||
|
|
- Video Core Next (VCN) hardware encoder
|
||
|
|
- Supports H.264, H.265, AV1 (RDNA 3+)
|
||
|
|
- AMF (Advanced Media Framework) SDK
|
||
|
|
- VAAPI support via Mesa `radeonsi` driver
|
||
|
|
- VCN 4.0+ competitive with NVENC in quality
|
||
|
|
|
||
|
|
### Key Design Insights for WayRay
|
||
|
|
|
||
|
|
- **Start with H.264** for maximum compatibility, add H.265/AV1 as options
|
||
|
|
- Use **VAAPI** as the primary encoding API (works across Intel/AMD, open-source)
|
||
|
|
- Add NVENC support via FFmpeg/GStreamer for NVIDIA GPUs
|
||
|
|
- **CBR + no B-frames + gradual intra refresh** for lowest latency
|
||
|
|
- AV1's screen content coding mode is a significant advantage for desktop content
|
||
|
|
- The **DMA-BUF -> VAAPI encode** path is zero-copy and should be the primary pipeline
|
||
|
|
|
||
|
|
### Sources
|
||
|
|
- [NVIDIA Video Codec SDK](https://developer.nvidia.com/video-codec-sdk)
|
||
|
|
- [NVENC Application Note](https://docs.nvidia.com/video-technologies/video-codec-sdk/13.0/nvenc-application-note/index.html)
|
||
|
|
- [NVIDIA AV1 Blog Post](https://developer.nvidia.com/blog/improving-video-quality-and-performance-with-av1-and-nvidia-ada-lovelace-architecture/)
|
||
|
|
- [GPU Video Encoder Evaluation](https://arxiv.org/html/2511.18688v2)
|
||
|
|
- [VA-API Intel Documentation](https://intel.github.io/libva/)
|
||
|
|
- [Hardware Video Acceleration ArchWiki](https://wiki.archlinux.org/title/Hardware_video_acceleration)
|
||
|
|
- [Chrome AV1 Improvements](https://developer.chrome.com/blog/av1)
|
||
|
|
- [CBR vs VBR for Game Streaming](https://pulsegeek.com/articles/cbr-vs-vbr-for-low-latency-game-streaming/)
|
||
|
|
- [AV1 SVC in WebRTC](https://w3c.github.io/webrtc-svc/)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 7. Network Protocols
|
||
|
|
|
||
|
|
### TCP vs. UDP vs. QUIC for Remote Display
|
||
|
|
|
||
|
|
| Property | TCP | UDP | QUIC |
|
||
|
|
|---|---|---|---|
|
||
|
|
| **Reliability** | Full (retransmit) | None | Selectable per-stream |
|
||
|
|
| **Head-of-line blocking** | Yes (single stream) | No | No (multiplexed streams) |
|
||
|
|
| **Connection setup** | 1-3 RTT (TCP + TLS) | 0 RTT | 0-1 RTT |
|
||
|
|
| **Congestion control** | Kernel-space, slow to update | Application-managed | User-space, pluggable |
|
||
|
|
| **NAT/firewall traversal** | Good | Moderate | Moderate (UDP-based) |
|
||
|
|
| **Encryption** | Optional (TLS) | Optional (DTLS) | Mandatory (TLS 1.3) |
|
||
|
|
|
||
|
|
### QUIC Advantages for Remote Display
|
||
|
|
|
||
|
|
QUIC is increasingly compelling for remote display:
|
||
|
|
|
||
|
|
1. **Stream multiplexing without HOL blocking**: Display, input, audio can be separate QUIC streams. A lost display packet doesn't stall input delivery.
|
||
|
|
2. **0-RTT connection setup**: Critical for session resumption / hot-desking scenarios
|
||
|
|
3. **Pluggable congestion control**: Can use algorithms optimized for low-latency interactive traffic (e.g., BBR, COPA)
|
||
|
|
4. **Connection migration**: Session survives network changes (WiFi -> Ethernet)
|
||
|
|
|
||
|
|
### QUIC Challenges
|
||
|
|
|
||
|
|
- **Firewall blocking**: Some corporate networks block UDP, forcing TCP fallback. The fallback penalty is severe (full session teardown + TCP reconnect).
|
||
|
|
- **Library maturity**: QUIC implementations are still maturing. Key libraries:
|
||
|
|
- **quinn** (Rust): Well-maintained, async, good for our use case
|
||
|
|
- **quiche** (Cloudflare, Rust/C): Production-tested
|
||
|
|
- **s2n-quic** (AWS, Rust): High performance
|
||
|
|
- **CPU overhead**: QUIC's encryption and user-space processing can be higher than kernel TCP
|
||
|
|
|
||
|
|
### Media over QUIC (MoQ)
|
||
|
|
|
||
|
|
MoQ is an emerging IETF standard (RFC expected 2026) that combines:
|
||
|
|
- Low-latency interactivity of WebRTC
|
||
|
|
- Scalability of HLS/DASH
|
||
|
|
- Built on QUIC/WebTransport
|
||
|
|
|
||
|
|
**Architecture**: Publish-subscribe model with tracks, groups, and objects. Sub-250ms latency target.
|
||
|
|
|
||
|
|
**Relevance**: MoQ's concepts (prioritized streams, partial reliability, adaptive quality) are directly applicable to remote display, though the protocol itself is focused on media distribution rather than interactive desktop.
|
||
|
|
|
||
|
|
**Implementations**: Cloudflare has deployed MoQ relays on their global network. OpenMOQ consortium (Akamai, Cisco, YouTube, etc.) developing open source implementations.
|
||
|
|
|
||
|
|
### Adaptive Bitrate for Remote Display
|
||
|
|
|
||
|
|
Key strategies:
|
||
|
|
- **Bandwidth estimation**: Measure RTT and throughput continuously
|
||
|
|
- **Quality adjustment**: Change encoder bitrate, resolution, or frame rate
|
||
|
|
- **Frame dropping**: Under extreme congestion, drop non-reference frames
|
||
|
|
- **Temporal scalability (SVC)**: Encode with multiple temporal layers, drop higher layers under congestion
|
||
|
|
- **Resolution scaling**: Encode at lower resolution and upscale on client (works well with modern upscaling algorithms)
|
||
|
|
|
||
|
|
### Latency Budget
|
||
|
|
|
||
|
|
For interactive remote desktop, the target end-to-end latency budget:
|
||
|
|
|
||
|
|
| Stage | Target |
|
||
|
|
|---|---|
|
||
|
|
| Capture | <1ms (DMA-BUF) |
|
||
|
|
| Encode | 1-5ms (hardware) |
|
||
|
|
| Network (LAN) | <1ms |
|
||
|
|
| Network (WAN) | 10-100ms |
|
||
|
|
| Decode | 1-3ms (hardware) |
|
||
|
|
| Render | <1ms |
|
||
|
|
| **Total (LAN)** | **<10ms** |
|
||
|
|
| **Total (WAN)** | **15-110ms** |
|
||
|
|
|
||
|
|
### Key Design Insights for WayRay
|
||
|
|
|
||
|
|
- **Use QUIC as primary transport** with TCP fallback
|
||
|
|
- Rust has excellent QUIC libraries (quinn)
|
||
|
|
- Separate QUIC streams for display, input, audio, USB
|
||
|
|
- Input should be highest priority (lowest latency)
|
||
|
|
- Implement adaptive bitrate from the start
|
||
|
|
- Consider SVC temporal layers in the encoder for graceful degradation
|
||
|
|
|
||
|
|
### Sources
|
||
|
|
- [Media Over QUIC IETF Working Group](https://datatracker.ietf.org/group/moq/about/)
|
||
|
|
- [Cloudflare MoQ Blog](https://blog.cloudflare.com/moq/)
|
||
|
|
- [Streaming Remote Rendering: QUIC vs WebRTC](https://arxiv.org/html/2505.22132v1)
|
||
|
|
- [MOQ Protocol Explained - WebRTC.ventures](https://webrtc.ventures/2025/10/moq-protocol-explained-unifying-real-time-and-scalable-streaming/)
|
||
|
|
- [MoQ - nanocosmos](https://www.nanocosmos.net/blog/media-over-quic-moq/)
|
||
|
|
- [QUIC Fix for Video Streaming](https://arxiv.org/pdf/1809.10270)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 8. Framebuffer Capture Techniques
|
||
|
|
|
||
|
|
### DMA-BUF Export (Zero-Copy)
|
||
|
|
|
||
|
|
**DMA-BUF** is the Linux kernel subsystem for sharing buffers between devices (GPU, display, video encoder).
|
||
|
|
|
||
|
|
**How it works**:
|
||
|
|
1. GPU renders frame into a DMA-BUF object (fd-backed GPU memory)
|
||
|
|
2. The fd is passed to the consumer (encoder, another GPU, etc.)
|
||
|
|
3. No CPU copy occurs; the buffer stays in GPU memory
|
||
|
|
|
||
|
|
**For a Wayland compositor acting as a thin client server**:
|
||
|
|
```
|
||
|
|
[Wayland clients] --> [Compositor renders to GPU buffer]
|
||
|
|
|
|
||
|
|
[DMA-BUF export (fd)]
|
||
|
|
|
|
||
|
|
[VAAPI encoder imports fd]
|
||
|
|
|
|
||
|
|
[Encoded bitstream -> network]
|
||
|
|
```
|
||
|
|
|
||
|
|
**Key protocols**:
|
||
|
|
- `linux-dmabuf-v1`: Clients use this to submit GPU-rendered buffers to the compositor
|
||
|
|
- `ext-image-copy-capture-v1`: Captures compositor output as DMA-BUF
|
||
|
|
- DMA-BUF feedback (v4): Tells clients which GPU/format the compositor prefers
|
||
|
|
|
||
|
|
### GPU Readback (Fallback)
|
||
|
|
|
||
|
|
When DMA-BUF export is not possible:
|
||
|
|
1. Compositor renders to GPU texture
|
||
|
|
2. `glReadPixels()` or equivalent copies pixels to CPU memory
|
||
|
|
3. CPU memory is then compressed/encoded
|
||
|
|
|
||
|
|
This is **significantly slower** due to the GPU -> CPU copy and pipeline stall, but universally supported.
|
||
|
|
|
||
|
|
### Damage Tracking
|
||
|
|
|
||
|
|
**Damage tracking** identifies which regions of the screen changed between frames, avoiding retransmission of unchanged areas.
|
||
|
|
|
||
|
|
**Wayland's built-in damage tracking**:
|
||
|
|
- Each `wl_surface.commit()` includes damage rectangles via `wl_surface.damage()` or `wl_surface.damage_buffer()`
|
||
|
|
- The compositor knows exactly which surface regions changed
|
||
|
|
|
||
|
|
**Compositor-level damage**:
|
||
|
|
- The compositor tracks which regions of the output changed (due to surface damage, window moves, overlapping windows, etc.)
|
||
|
|
- `ext-image-copy-capture-v1` supports damage reporting: the compositor tells the capturer which regions changed since the last frame
|
||
|
|
|
||
|
|
**For encoding efficiency**:
|
||
|
|
- With H.264/H.265/AV1: damage regions inform the encoder which macroblocks to mark as changed
|
||
|
|
- With lossless compression: only changed regions need to be compressed and sent
|
||
|
|
- With hybrid approach: unchanged regions get zero bits, changed regions get full encoding
|
||
|
|
|
||
|
|
### wl-screenrec: Reference Implementation
|
||
|
|
|
||
|
|
[wl-screenrec](https://github.com/russelltg/wl-screenrec) is a Rust project demonstrating high-performance Wayland screen recording:
|
||
|
|
- Uses wlr-screencopy with DMA-BUF
|
||
|
|
- Hardware encoding via VAAPI
|
||
|
|
- Zero-copy pipeline (DMA-BUF -> VAAPI -> file)
|
||
|
|
- Written in Rust, good reference for our implementation
|
||
|
|
|
||
|
|
### Key Design Insights for WayRay
|
||
|
|
|
||
|
|
- **Own the compositor**: By building/extending a Wayland compositor, we have direct access to all rendering state, damage information, and DMA-BUF handles
|
||
|
|
- **DMA-BUF -> VAAPI is the critical path**: This zero-copy pipeline should be the primary encoding path
|
||
|
|
- **Damage tracking reduces encoding work**: Use Wayland's built-in damage tracking to minimize what gets encoded
|
||
|
|
- **Fallback to GPU readback** for unsupported hardware
|
||
|
|
- **wl-screenrec** is a good Rust reference for the capture -> encode pipeline
|
||
|
|
|
||
|
|
### Sources
|
||
|
|
- [Linux DMA-BUF Kernel Documentation](https://dri.freedesktop.org/docs/drm/driver-api/dma-buf.html)
|
||
|
|
- [Linux DMA-BUF Wayland Protocol](https://wayland-book.com/surfaces/dmabuf.html)
|
||
|
|
- [ext-image-copy-capture-v1](https://wayland.app/protocols/ext-image-copy-capture-v1)
|
||
|
|
- [wlr-export-dmabuf-unstable-v1](https://wayland.app/protocols/wlr-export-dmabuf-unstable-v1)
|
||
|
|
- [wl-screenrec GitHub](https://github.com/russelltg/wl-screenrec)
|
||
|
|
- [OBS Zero-Copy Capture](https://obsproject.com/forum/threads/experimental-zero-copy-screen-capture-on-linux.101262/)
|
||
|
|
- [GStreamer DMA-BUF Design](https://gstreamer.freedesktop.org/documentation/additional/design/dmabuf.html)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 9. Audio Forwarding
|
||
|
|
|
||
|
|
### PipeWire Network Audio
|
||
|
|
|
||
|
|
PipeWire provides several mechanisms for network audio:
|
||
|
|
|
||
|
|
#### RTP Modules (Recommended)
|
||
|
|
|
||
|
|
**`module-rtp-sink`**: Creates a PipeWire sink that sends audio as RTP packets
|
||
|
|
- Supports raw PCM, Opus encoding
|
||
|
|
- Configurable latency via `sess.latency.msec` (default: 100ms for network)
|
||
|
|
- Uses SAP/mDNS for discovery
|
||
|
|
|
||
|
|
**`module-rtp-source`**: Creates a PipeWire source that receives RTP packets
|
||
|
|
- DLL-based clock recovery to handle network jitter
|
||
|
|
- Configurable ring buffer fill level
|
||
|
|
|
||
|
|
**`module-rtp-session`**: Combined send/receive with automatic discovery
|
||
|
|
- Uses Apple MIDI protocol for low-latency bidirectional MIDI
|
||
|
|
- Announced via Avahi/mDNS/Bonjour
|
||
|
|
|
||
|
|
#### Pulse Tunnel Module
|
||
|
|
|
||
|
|
**`module-pulse-tunnel`**: Tunnels audio to/from a remote PulseAudio/PipeWire-Pulse server
|
||
|
|
- Simpler setup, works over TCP
|
||
|
|
- Higher latency than RTP approach
|
||
|
|
- Good for compatibility with existing PulseAudio setups
|
||
|
|
|
||
|
|
### Low-Latency Audio Considerations
|
||
|
|
|
||
|
|
For remote desktop audio, the targets are:
|
||
|
|
|
||
|
|
| Parameter | Target |
|
||
|
|
|---|---|
|
||
|
|
| **Codec** | Opus (designed for low latency) |
|
||
|
|
| **Frame size** | 2.5ms - 10ms (Opus supports down to 2.5ms) |
|
||
|
|
| **Buffer/Quantum** | As low as 128 samples @ 48kHz (~2.67ms) |
|
||
|
|
| **Network jitter buffer** | 10-30ms |
|
||
|
|
| **Total one-way latency** | 15-50ms |
|
||
|
|
|
||
|
|
**Opus codec advantages**:
|
||
|
|
- Designed for both speech and music
|
||
|
|
- 2.5ms to 60ms frame sizes
|
||
|
|
- 6 kbps to 510 kbps bitrate range
|
||
|
|
- Built-in forward error correction (FEC)
|
||
|
|
- Packet loss concealment (PLC)
|
||
|
|
|
||
|
|
### Custom Audio Pipeline for Thin Client
|
||
|
|
|
||
|
|
For a purpose-built thin client, the audio pipeline should be:
|
||
|
|
|
||
|
|
```
|
||
|
|
[Server PipeWire] -> [Opus encode] -> [RTP/QUIC] -> [Opus decode] -> [Client audio output]
|
||
|
|
[Client microphone] -> [Opus encode] -> [RTP/QUIC] -> [Opus decode] -> [Server PipeWire]
|
||
|
|
```
|
||
|
|
|
||
|
|
Key considerations:
|
||
|
|
- **Clock synchronization**: Client and server audio clocks will drift. Need adaptive resampling or buffer management.
|
||
|
|
- **Jitter compensation**: Network jitter requires a playout buffer. Adaptive jitter buffer adjusts to network conditions.
|
||
|
|
- **Echo cancellation**: If microphone and speakers are on the same client device, need AEC.
|
||
|
|
|
||
|
|
### Key Design Insights for WayRay
|
||
|
|
|
||
|
|
- **Opus over QUIC** is the right approach for a custom thin client
|
||
|
|
- PipeWire's RTP module is a good starting point but we may want tighter integration
|
||
|
|
- Clock drift compensation is critical for long-running sessions
|
||
|
|
- Audio and video synchronization (lip sync) must be maintained
|
||
|
|
- Forward error correction helps with packet loss without retransmission latency
|
||
|
|
|
||
|
|
### Sources
|
||
|
|
- [PipeWire RTP Session Module](https://docs.pipewire.org/page_module_rtp_session.html)
|
||
|
|
- [PipeWire RTP Sink](https://docs.pipewire.org/page_module_rtp_sink.html)
|
||
|
|
- [PipeWire RTP Source](https://docs.pipewire.org/page_module_rtp_source.html)
|
||
|
|
- [PipeWire Pulse Tunnel](https://docs.pipewire.org/page_module_pulse_tunnel.html)
|
||
|
|
- [PipeWire/PulseAudio RTP Network Audio Guide (Oct 2025)](https://liotier.medium.com/pipewire-pulseaudio-rtp-network-audio-in-october-2025-a-configuration-guide-to-the-remote-time-e8dc0e20e3b0)
|
||
|
|
- [PipeWire ArchWiki](https://wiki.archlinux.org/title/PipeWire)
|
||
|
|
- [PulseAudio Network Setup](https://www.freedesktop.org/wiki/Software/PulseAudio/Documentation/User/Network/)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 10. USB/IP
|
||
|
|
|
||
|
|
### Architecture
|
||
|
|
|
||
|
|
USB/IP is a Linux kernel subsystem that shares USB devices over TCP/IP networks.
|
||
|
|
|
||
|
|
**Components**:
|
||
|
|
|
||
|
|
| Component | Side | Role |
|
||
|
|
|---|---|---|
|
||
|
|
| **usbip-core** | Both | Shared protocol and utility code |
|
||
|
|
| **vhci-hcd** | Client | Virtual Host Controller Interface - presents virtual USB ports to the local USB stack |
|
||
|
|
| **usbip-host** (stub) | Server | Binds to physical USB devices, encapsulates URBs for network transmission |
|
||
|
|
| **usbip-vudc** | Server | Virtual USB Device Controller, for USB Gadget-based virtual devices |
|
||
|
|
|
||
|
|
### Protocol
|
||
|
|
|
||
|
|
**Discovery**: Client sends `OP_REQ_DEVLIST` over TCP, server responds with `OP_REP_DEVLIST` listing exportable devices.
|
||
|
|
|
||
|
|
**Attachment**: Client sends `OP_REQ_IMPORT`, server responds with `OP_REP_IMPORT` and begins forwarding URBs.
|
||
|
|
|
||
|
|
**Data transfer**: USB Request Blocks (URBs) are encapsulated in TCP packets and forwarded between stub driver and VHCI. The device driver runs entirely on the **client** side.
|
||
|
|
|
||
|
|
**Port**: TCP 3240 (default)
|
||
|
|
|
||
|
|
### Protocol Flow
|
||
|
|
|
||
|
|
```
|
||
|
|
[USB Device] <-> [Stub Driver (server kernel)]
|
||
|
|
|
|
||
|
|
[TCP/IP Network]
|
||
|
|
|
|
||
|
|
[VHCI Driver (client kernel)]
|
||
|
|
|
|
||
|
|
[USB Device Driver (client)]
|
||
|
|
|
|
||
|
|
[Application (client)]
|
||
|
|
```
|
||
|
|
|
||
|
|
### Kernel Integration
|
||
|
|
|
||
|
|
- Merged into mainline Linux since **kernel 3.17**
|
||
|
|
- Source: `drivers/usb/usbip/` and `tools/usb/usbip/`
|
||
|
|
- Supports USB 2.0 and USB 3.0 devices
|
||
|
|
- Windows support via [usbip-win](https://github.com/cezanne/usbip-win)
|
||
|
|
|
||
|
|
### Limitations
|
||
|
|
|
||
|
|
- **Latency**: TCP round-trip for every URB can add significant latency for isochronous devices (audio, video)
|
||
|
|
- **Bandwidth**: USB 3.0 bulk transfers work well, but sustained high-bandwidth is limited by network
|
||
|
|
- **Isochronous transfers**: Not well supported (real-time USB audio/video devices may not work)
|
||
|
|
- **Security**: No built-in encryption (must tunnel through SSH/VPN)
|
||
|
|
|
||
|
|
### Alternatives: SPICE usbredir
|
||
|
|
|
||
|
|
SPICE's USB redirection (`usbredir`) is an alternative approach:
|
||
|
|
- Library: `libusbredir`
|
||
|
|
- Works at the USB protocol level (like USB/IP)
|
||
|
|
- Better integration with SPICE's authentication/encryption
|
||
|
|
- Can be used independently of SPICE
|
||
|
|
|
||
|
|
### Key Design Insights for WayRay
|
||
|
|
|
||
|
|
- **USB/IP is mature and kernel-integrated** - good baseline
|
||
|
|
- For a thin client, wrapping USB/IP over QUIC (instead of raw TCP) would add encryption and better congestion handling
|
||
|
|
- **usbredir** is worth considering as it's designed for remote desktop use cases
|
||
|
|
- Isochronous USB devices (webcams, audio interfaces) are challenging over network and may need special handling
|
||
|
|
- Consider selective USB forwarding - only forward devices the user explicitly shares
|
||
|
|
|
||
|
|
### Sources
|
||
|
|
- [USB/IP Kernel Documentation](https://docs.kernel.org/usb/usbip_protocol.html)
|
||
|
|
- [USB/IP ArchWiki](https://wiki.archlinux.org/title/USB/IP)
|
||
|
|
- [USB/IP Project](https://usbip.sourceforge.net/)
|
||
|
|
- [Linux Kernel USB/IP Source](https://github.com/torvalds/linux/tree/master/tools/usb/usbip)
|
||
|
|
- [USB/IP Tutorial - Linux Magazine](https://www.linux-magazine.com/Issues/2018/208/Tutorial-USB-IP)
|
||
|
|
- [usbip-win (Windows Support)](https://github.com/cezanne/usbip-win)
|
||
|
|
- [VirtualHere (Commercial Alternative)](https://www.virtualhere.com/)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 11. Modern Thin Client Projects
|
||
|
|
|
||
|
|
### Sun Ray (Historical Reference)
|
||
|
|
|
||
|
|
The original Sun Ray (1999-2014) is the gold standard for thin client architecture:
|
||
|
|
|
||
|
|
- **Protocol**: Appliance Link Protocol (ALP) over UDP/IP
|
||
|
|
- **Architecture**: Completely stateless DTUs (Desktop Terminal Units) with zero local storage/OS
|
||
|
|
- **Session model**: Sessions are independent of physical hardware. Pull your smartcard, insert at another Sun Ray, session follows instantly ("hot desking")
|
||
|
|
- **Server**: Sun Ray Server Software (SRSS) managed sessions, ran on Solaris/Linux
|
||
|
|
- **Network**: Standard switched Ethernet, DHCP-based configuration
|
||
|
|
- **Security**: SSL/TLS encryption with 128-bit ARCFOUR
|
||
|
|
- **Display**: Rendered entirely on server, compressed framebuffer sent to DTU
|
||
|
|
|
||
|
|
**Key Sun Ray concepts to replicate**:
|
||
|
|
- Instant session mobility (smartcard/badge driven)
|
||
|
|
- Zero client-side state
|
||
|
|
- Centralized session management
|
||
|
|
- Simple, robust network boot
|
||
|
|
|
||
|
|
### Wafer (Wayland-Based Thin Client)
|
||
|
|
|
||
|
|
[Wafer](https://github.com/lp-programming/Wafer) is the most directly comparable modern project:
|
||
|
|
|
||
|
|
- **Goal**: Thin client for Linux server + Linux clients over high-speed LAN
|
||
|
|
- **Protocol**: Wayland protocol over network
|
||
|
|
- **Server** ("Mainframe"): Multi-core machine with GBM-capable GPU
|
||
|
|
- **Design**: Full 3D acceleration on server, minimal CPU on client (Raspberry Pi target)
|
||
|
|
- **Status**: Proof of concept / early development
|
||
|
|
|
||
|
|
### Sunshine + Moonlight
|
||
|
|
|
||
|
|
[Sunshine](https://github.com/LizardByte/Sunshine) (server) + [Moonlight](https://moonlight-stream.org/) (client) is the most mature open-source streaming solution:
|
||
|
|
|
||
|
|
- **Protocol**: Based on NVIDIA GameStream protocol
|
||
|
|
- **Encoding**: H.264, H.265, AV1 with NVENC, VAAPI, AMF hardware encoding
|
||
|
|
- **Performance**: Sub-10ms latency on LAN, up to 120 FPS
|
||
|
|
- **Clients**: Android, iOS, PC, Mac, Raspberry Pi, Steam Deck, Nintendo Switch, LG webOS
|
||
|
|
- **Audio**: Full audio streaming with multi-channel support
|
||
|
|
- **Input**: Mouse, keyboard, gamepad, touchscreen
|
||
|
|
- **Limitations**: Designed for single-user gaming, not multi-user thin client
|
||
|
|
|
||
|
|
### WayVNC
|
||
|
|
|
||
|
|
[WayVNC](https://github.com/any1/wayvnc) is a VNC server for wlroots-based Wayland compositors:
|
||
|
|
|
||
|
|
- Implements RFB protocol over wlr-screencopy / ext-image-copy-capture
|
||
|
|
- Supports headless mode (no physical display)
|
||
|
|
- Authentication: PAM, TLS (VeNCrypt), RSA-AES
|
||
|
|
- Input: Virtual pointer and keyboard via Wayland protocols
|
||
|
|
- JSON-IPC for runtime control
|
||
|
|
- Good reference for Wayland compositor integration
|
||
|
|
|
||
|
|
### GNOME Remote Desktop
|
||
|
|
|
||
|
|
GNOME's built-in remote desktop solution:
|
||
|
|
|
||
|
|
- Speaks **RDP** (primary) and VNC
|
||
|
|
- Uses PipeWire for screen capture via Mutter's ScreenCast D-Bus API
|
||
|
|
- Supports headless multi-user sessions (GNOME 46+)
|
||
|
|
- Input forwarding via Mutter's RemoteDesktop D-Bus API
|
||
|
|
- Integrated with GDM for remote login
|
||
|
|
- Active development, improving rapidly
|
||
|
|
|
||
|
|
### ThinStation
|
||
|
|
|
||
|
|
[ThinStation](https://thinstation.github.io/thinstation/) is a framework for building thin client Linux images:
|
||
|
|
|
||
|
|
- Supports Citrix ICA, SPICE, NX, RDP, VMware Horizon
|
||
|
|
- Boots from network (PXE), USB, or compact flash
|
||
|
|
- Not a protocol itself, but a client OS/distribution
|
||
|
|
|
||
|
|
### openthinclient
|
||
|
|
|
||
|
|
[openthinclient](https://openthinclient.com/) is a commercial open-source thin client management platform:
|
||
|
|
|
||
|
|
- Based on Debian (latest: Debian 13 "Trixie")
|
||
|
|
- Manages thin client fleet, user sessions, applications
|
||
|
|
- Supports multiple VDI protocols
|
||
|
|
- Version 2603 (2025) includes updated VDI components
|
||
|
|
|
||
|
|
### Key Design Insights for WayRay
|
||
|
|
|
||
|
|
- **Sunshine/Moonlight** proves that low-latency game streaming is solved; adapt for desktop
|
||
|
|
- **WayVNC** shows how to integrate with wlroots compositors
|
||
|
|
- **GNOME Remote Desktop** shows the PipeWire + portal approach
|
||
|
|
- **Wafer** validates the concept but is early-stage
|
||
|
|
- **Sun Ray's session mobility** is the killer feature to replicate
|
||
|
|
- No existing project combines: Wayland-native + multi-user + session mobility + hardware encoding + QUIC transport
|
||
|
|
|
||
|
|
### Sources
|
||
|
|
- [Sun Ray Wikipedia](https://en.wikipedia.org/wiki/Sun_Ray)
|
||
|
|
- [Sun Ray System Overview - Oracle](https://docs.oracle.com/cd/E19634-01/820-0411/overview.html)
|
||
|
|
- [Using Sun Ray Thin Clients in 2025](https://catstret.ch/202506/sun-ray-shenanigans/)
|
||
|
|
- [Wafer GitHub](https://github.com/lp-programming/Wafer)
|
||
|
|
- [Sunshine GitHub](https://github.com/LizardByte/Sunshine)
|
||
|
|
- [Moonlight](https://moonlight-stream.org/)
|
||
|
|
- [WayVNC GitHub](https://github.com/any1/wayvnc)
|
||
|
|
- [GNOME Remote Desktop Wiki](https://wiki.gnome.org/Projects/Mutter/RemoteDesktop)
|
||
|
|
- [ThinStation](https://thinstation.github.io/thinstation/)
|
||
|
|
- [openthinclient](https://openthinclient.com/)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 12. Architecture Recommendations for a SunRay-Like System
|
||
|
|
|
||
|
|
Based on all the research above, here is a synthesized architectural recommendation:
|
||
|
|
|
||
|
|
### Core Architecture
|
||
|
|
|
||
|
|
```
|
||
|
|
┌─────────────────────────────────────────────┐
|
||
|
|
│ WayRay Server │
|
||
|
|
│ │
|
||
|
|
│ ┌─────────────────────────────────┐ │
|
||
|
|
│ │ Wayland Compositor (wlroots) │ │
|
||
|
|
│ │ - Per-user session │ │
|
||
|
|
│ │ - DMA-BUF output │ │
|
||
|
|
│ │ - Damage tracking │ │
|
||
|
|
│ └──────────┬──────────────────────┘ │
|
||
|
|
│ │ DMA-BUF (zero-copy) │
|
||
|
|
│ ┌──────────▼──────────────────────┐ │
|
||
|
|
│ │ Encoder Pipeline │ │
|
||
|
|
│ │ - VAAPI H.264/H.265/AV1 │ │
|
||
|
|
│ │ - Damage-aware encoding │ │
|
||
|
|
│ │ - Adaptive bitrate │ │
|
||
|
|
│ └──────────┬──────────────────────┘ │
|
||
|
|
│ │ Encoded frames │
|
||
|
|
│ ┌──────────▼──────────────────────┐ │
|
||
|
|
│ │ Session Manager │ │
|
||
|
|
│ │ - Multi-user sessions │ │
|
||
|
|
│ │ - Session migration │ │
|
||
|
|
│ │ - Authentication │ │
|
||
|
|
│ └──────────┬──────────────────────┘ │
|
||
|
|
│ │ │
|
||
|
|
│ ┌──────────▼──────────────────────┐ │
|
||
|
|
│ │ QUIC Transport │ │
|
||
|
|
│ │ - Display stream (video) │ │
|
||
|
|
│ │ - Input stream (low-latency) │ │
|
||
|
|
│ │ - Audio stream (Opus/RTP) │ │
|
||
|
|
│ │ - USB stream (usbredir) │ │
|
||
|
|
│ │ - Control stream │ │
|
||
|
|
│ └──────────┬──────────────────────┘ │
|
||
|
|
└─────────────┼───────────────────────────────┘
|
||
|
|
│ QUIC / Network
|
||
|
|
┌─────────────┼───────────────────────────────┐
|
||
|
|
│ │ │
|
||
|
|
│ ┌──────────▼──────────────────────┐ │
|
||
|
|
│ │ QUIC Transport │ │
|
||
|
|
│ └──────────┬──────────────────────┘ │
|
||
|
|
│ │ │
|
||
|
|
│ ┌──────────▼──────────────────────┐ │
|
||
|
|
│ │ Decoder (VAAPI/SW) │ │
|
||
|
|
│ │ + Audio (Opus decode) │ │
|
||
|
|
│ │ + Input capture │ │
|
||
|
|
│ │ + USB forwarding │ │
|
||
|
|
│ └──────────┬──────────────────────┘ │
|
||
|
|
│ │ │
|
||
|
|
│ ┌──────────▼──────────────────────┐ │
|
||
|
|
│ │ Minimal Wayland Compositor │ │
|
||
|
|
│ │ (or direct DRM/KMS output) │ │
|
||
|
|
│ └─────────────────────────────────┘ │
|
||
|
|
│ WayRay Client │
|
||
|
|
└─────────────────────────────────────────────┘
|
||
|
|
```
|
||
|
|
|
||
|
|
### Technology Stack Recommendations
|
||
|
|
|
||
|
|
| Component | Recommended Technology | Rationale |
|
||
|
|
|---|---|---|
|
||
|
|
| **Server compositor** | wlroots-based custom compositor | Direct access to DMA-BUF, damage tracking, input injection |
|
||
|
|
| **Capture** | Direct compositor integration (no protocol needed) | Lowest latency, full damage info |
|
||
|
|
| **Encoding** | VAAPI (primary), NVENC (optional) via FFmpeg/GStreamer | Cross-vendor, zero-copy from DMA-BUF |
|
||
|
|
| **Video codec** | H.264 (default), AV1 (preferred when supported) | H.264 for compatibility, AV1 for quality/bandwidth |
|
||
|
|
| **Transport** | QUIC (quinn crate) with TCP fallback | Low latency, multiplexing, 0-RTT |
|
||
|
|
| **Audio** | Opus over QUIC stream | Low latency, built-in FEC |
|
||
|
|
| **USB** | usbredir over QUIC stream | Designed for remote desktop |
|
||
|
|
| **Session management** | Custom (inspired by Sun Ray SRSS) | Session mobility, multi-user |
|
||
|
|
| **Client display** | DRM/KMS direct or minimal Wayland compositor | Minimal overhead |
|
||
|
|
| **Language** | Rust | Safety, performance, excellent ecosystem (smithay, quinn, etc.) |
|
||
|
|
|
||
|
|
### QUIC Stream Layout
|
||
|
|
|
||
|
|
| Stream ID | Type | Priority | Reliability | Content |
|
||
|
|
|---|---|---|---|---|
|
||
|
|
| 0 | Bidirectional | Highest | Reliable | Control/session management |
|
||
|
|
| 1 | Server -> Client | High | Unreliable | Video frames |
|
||
|
|
| 2 | Client -> Server | Highest | Reliable | Input events |
|
||
|
|
| 3 | Server -> Client | Medium | Reliable | Audio playback |
|
||
|
|
| 4 | Client -> Server | Medium | Reliable | Audio capture |
|
||
|
|
| 5 | Bidirectional | Low | Reliable | USB/IP data |
|
||
|
|
| 6 | Bidirectional | Medium | Reliable | Clipboard |
|
||
|
|
|
||
|
|
### Encoding Strategy
|
||
|
|
|
||
|
|
1. **Damage detection**: Compositor reports damaged regions per frame
|
||
|
|
2. **Content classification**: Heuristically detect video regions vs. desktop content (like SPICE does)
|
||
|
|
3. **Encoding decision**:
|
||
|
|
- Small damage, text/UI: Lossless (zstd-compressed) tile updates
|
||
|
|
- Large damage, desktop: H.264/AV1 with high quality, low bitrate
|
||
|
|
- Video regions: H.264/AV1 with lower quality, higher frame rate
|
||
|
|
- Full screen video: Full-frame H.264/AV1 encoding
|
||
|
|
4. **Adaptive quality**: Adjust based on measured bandwidth and latency
|
||
|
|
|
||
|
|
### Sun Ray Features to Implement
|
||
|
|
|
||
|
|
1. **Session mobility**: Associate sessions with authentication tokens, not hardware. Insert token at any client -> session follows.
|
||
|
|
2. **Stateless clients**: Client boots from network, has no persistent state.
|
||
|
|
3. **Centralized management**: Server manages all sessions, client configurations, authentication.
|
||
|
|
4. **Hot desking**: Disconnect from one client, connect at another, session is exactly where you left it.
|
||
|
|
5. **Multi-monitor**: Support multiple displays per session.
|