Update ADR-011: Add bare-metal framebuffer backend (Tier 0)

illumos has /dev/fb0 via the gfxp_bitmap driver on UEFI GOP systems,
exposing the classic SunOS fbio(4I) interface. Userspace can mmap the
framebuffer and write pixels directly -- proven by xf86-video-illumosfb.

New four-tier architecture:
- Tier 0: Bare-metal /dev/fb0 (illumos fbio + Linux fbdev). No X11.
- Tier 1: X11 SHM (portable fallback, also dev mode)
- Tier 2: Loopback shared memory (co-located optimization)
- Tier 3: DRM/KMS (Linux, rare illumos)

Includes implementation sketch with SIMD non-temporal stores for
write-combining memory (SSE2/AVX2/AVX-512 runtime selection).
This commit is contained in:
Till Wegmueller 2026-03-28 22:43:40 +01:00
parent 4e31f172fb
commit f005dccd67
No known key found for this signature in database

View file

@ -13,14 +13,47 @@ Getting pixels on screen on illumos is constrained:
| Path | Status on illumos | | Path | Status on illumos |
|------|-------------------| |------|-------------------|
| DRM/KMS | Intel Gen2-7 only (ancient). No AMD, no modern Intel. | | DRM/KMS | Intel Gen2-7 only (ancient). No AMD, no modern Intel. |
| Linux framebuffer (`/dev/fb0`) | Does not exist. No `vesafb`/`simplefb` equivalent. | | illumos `/dev/fb0` (fbio) | **Works.** UEFI GOP framebuffer via `gfxp_bitmap` driver. Userspace mmap + write pixels directly. Resolution fixed at boot. |
| illumos VIS (`/dev/fbs/*`) | Kernel-only ioctls (`FKIOCTL` check). Unusable from userspace. | | illumos VIS console ops | Kernel-only (`VIS_CONSDISPLAY` etc. check `FKIOCTL`). Not for userspace rendering. |
| X11 + VESA DDX | Works on any GPU. CPU-rendered but functional. | | X11 + illumosfb DDX | Works on UEFI GOP systems. Uses `/dev/fb0` directly. See [xf86-video-illumosfb](https://github.com/LuminousMonkey/xf86-video-illumosfb). |
| X11 + VESA DDX | Works on any GPU via VBE BIOS calls. CPU-rendered. |
| X11 + i915 DDX | Works on Intel Gen2-7 with DRI acceleration. | | X11 + i915 DDX | Works on Intel Gen2-7 with DRI acceleration. |
| X11 + NVIDIA proprietary | Works with specific driver versions. | | X11 + NVIDIA proprietary | Works with specific driver versions. |
| Mesa llvmpipe | Software OpenGL available everywhere. | | Mesa llvmpipe | Software OpenGL available everywhere. |
**X11 is the universal display path on illumos.** Every illumos workstation has a working X11 session, even if it's VESA-only. **Two universal display paths exist on illumos:**
1. **`/dev/fb0` bare-metal** -- direct framebuffer access on UEFI GOP systems (no X11 needed)
2. **X11** -- works everywhere including legacy BIOS via VESA DDX
### illumos `/dev/fb0` Details
The `gfxp_bitmap` kernel driver (backing the `vgatext` DDI driver) exposes the UEFI GOP framebuffer via the classic SunOS `fbio(4I)` interface:
```c
fd = open("/dev/fb0", O_RDWR);
ioctl(fd, VIS_GETIDENTIFIER, &ident); // -> "illumos_fb"
ioctl(fd, FBIOGATTR, &attr); // -> struct fbgattr with:
// resolution, depth, size
// pitch + RGB masks via gfxfb_info
// in sattr.dev_specific[]
buf = mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
ioctl(fd, KDSETMODE, KD_GRAPHICS); // take over from console
// write pixels directly to buf (write-combining memory)
// use non-temporal stores (SSE2/AVX2/AVX-512) for performance
ioctl(fd, KDSETMODE, KD_TEXT); // release back to console
```
**Constraints:**
- UEFI GOP only (legacy BIOS falls back to VGA text mode)
- Resolution fixed at boot time (no mode switching -- whatever GOP configured)
- Write-combining memory mapping requires non-temporal stores for performance
- `struct gfxfb_info` (pitch, RGB layout) smuggled through `fbsattr.dev_specific[8]`
**Source references:**
- `illumos-gate/usr/src/uts/common/sys/fbio.h` -- ioctl definitions, `gfxfb_info`
- `illumos-gate/usr/src/uts/i86pc/io/gfx_private/gfxp_bitmap.c` -- bitmap FB backend
- `illumos-gate/usr/src/uts/i86pc/io/gfx_private/gfxp_fb.c` -- ioctl dispatch
- `illumos-gate/usr/src/uts/intel/io/vgatext/vgatext.c` -- DDI driver creating `/dev/fb0`
### Smithay Backend Landscape ### Smithay Backend Landscape
@ -29,13 +62,42 @@ Getting pixels on screen on illumos is constrained:
| `backend_drm` | DRM/KMS + GBM + libseat | Only Intel Gen2-7 | | `backend_drm` | DRM/KMS + GBM + libseat | Only Intel Gen2-7 |
| `backend_x11` | X11 + DRM node + GBM | Only with DRM (rare) | | `backend_x11` | X11 + DRM node + GBM | Only with DRM (rare) |
| `backend_winit` | winit + EGL | Needs winit illumos patches + Mesa | | `backend_winit` | winit + EGL | Needs winit illumos patches + Mesa |
| Custom X11 SHM | X11 + MIT-SHM extension | Yes -- universal | | Custom fbio | `/dev/fb0` + UEFI GOP | Yes -- bare metal, no X11 needed |
| Custom X11 SHM | X11 + MIT-SHM extension | Yes -- universal fallback |
## Decision ## Decision
### Three-tier local display architecture: ### Four-tier local display architecture:
### Tier 1: Custom X11 SHM Backend (Primary, Portable) ### Tier 0: Bare-Metal Framebuffer Backend (illumos `/dev/fb0`, Linux `/dev/fb0`)
Direct framebuffer access -- WayRay as the **sole display server**, no X11 underneath.
On illumos (UEFI GOP systems):
1. Open `/dev/fb0`, verify `VIS_GETIDENTIFIER` returns `"illumos_fb"`
2. Query geometry via `FBIOGATTR` (resolution, depth, pitch, RGB layout from `gfxfb_info`)
3. `mmap()` the framebuffer (write-combining memory)
4. `KDSETMODE``KD_GRAPHICS` to take over from console
5. Render with `PixmanRenderer` into CPU buffer
6. Copy damaged regions to framebuffer using non-temporal stores (SSE2/AVX2/AVX-512)
7. Input from `/dev/kbd` + `/dev/mouse` (illumos STREAMS input devices)
On Linux:
1. Open `/dev/fb0`, query via `FBIOGET_VSCREENINFO` / `FBIOGET_FSCREENINFO`
2. `mmap()` the framebuffer
3. Render and blit same as above
4. Input via libinput or evdev
```
Wayland apps → Smithay compositor → PixmanRenderer → CPU buffer
→ non-temporal memcpy to /dev/fb0 → pixels on screen
```
**Constraints:** Resolution fixed at boot (UEFI GOP / VESA BIOS). No VSync (tearing possible). No hardware acceleration. Requires UEFI on illumos.
**Performance note:** `xf86-video-illumosfb` demonstrates that SIMD non-temporal stores are essential for write-combining memory. The backend must use `_mm_stream_si128` (SSE2), `_mm256_stream_si256` (AVX2), or `_mm512_stream_si512` (AVX-512) with runtime detection via `getisax(2)` on illumos or CPUID on Linux. Rust's `std::arch` intrinsics provide these.
### Tier 1: Custom X11 SHM Backend (Portable Fallback)
A custom Smithay backend that: A custom Smithay backend that:
1. Opens an X11 connection via `x11rb` (pure Rust XCB bindings) 1. Opens an X11 connection via `x11rb` (pure Rust XCB bindings)
@ -45,7 +107,7 @@ A custom Smithay backend that:
5. Receives keyboard/mouse input from X11 events 5. Receives keyboard/mouse input from X11 events
6. Maps X11 input events to Smithay's input types 6. Maps X11 input events to Smithay's input types
This works on **every illumos system with X11**, regardless of GPU. Even `xf86-video-vesa` works because we only need X11 SHM pixmap blitting. This works on **every illumos system with X11**, regardless of GPU or BIOS type. Even `xf86-video-vesa` works. Also useful for development (run WayRay in a window on your existing desktop).
``` ```
Wayland apps → Smithay compositor → PixmanRenderer → CPU buffer Wayland apps → Smithay compositor → PixmanRenderer → CPU buffer
@ -59,11 +121,11 @@ When wayray-server and wayray-client run on the same machine, skip encoding enti
1. Server renders to shared memory ring buffer (`shm_open` + `mmap`) 1. Server renders to shared memory ring buffer (`shm_open` + `mmap`)
2. Client reads framebuffers directly from shared memory 2. Client reads framebuffers directly from shared memory
3. Only damage regions communicated via small control channel 3. Only damage regions communicated via small control channel
4. Client presents to X11 via SHM pixmaps 4. Client presents via Tier 0 (fbdev) or Tier 1 (X11 SHM)
``` ```
Wayland apps → Smithay compositor → PixmanRenderer → shared memory Wayland apps → Smithay compositor → PixmanRenderer → shared memory
→ wayray-client (local) → XShmPutImage → screen → wayray-client (local) → fbdev or X11 SHM → screen
``` ```
Performance: sub-millisecond frame latency (vs 5-30ms with encode/decode), near-zero CPU overhead for transport, pixel-perfect quality. Performance: sub-millisecond frame latency (vs 5-30ms with encode/decode), near-zero CPU overhead for transport, pixel-perfect quality.
@ -79,10 +141,13 @@ On Linux or illumos with a supported DRM GPU (Intel Gen2-7):
``` ```
if cfg!(feature = "local-drm") && drm_device_available() { if cfg!(feature = "local-drm") && drm_device_available() {
// Tier 3: Direct DRM/KMS // Tier 3: Direct DRM/KMS (best performance)
use DrmBackend + GlesRenderer use DrmBackend + GlesRenderer
} else if local_mode && fbdev_available() {
// Tier 0: Bare-metal framebuffer (no X11 needed)
use FbdevBackend + PixmanRenderer
} else if local_mode && x11_available() { } else if local_mode && x11_available() {
// Tier 1: X11 SHM // Tier 1: X11 SHM (fallback, also good for development)
use X11ShmBackend + PixmanRenderer use X11ShmBackend + PixmanRenderer
} else { } else {
// Remote mode (default) // Remote mode (default)
@ -95,11 +160,68 @@ if cfg!(feature = "local-drm") && drm_device_available() {
| Mode | Backend | Renderer | Transport | Use Case | | Mode | Backend | Renderer | Transport | Use Case |
|------|---------|----------|-----------|----------| |------|---------|----------|-----------|----------|
| **Remote** | Headless | Pixman/GLES | QUIC (encode+decode) | Thin client (primary) | | **Remote** | Headless | Pixman/GLES | QUIC (encode+decode) | Thin client (primary) |
| **Local X11** | X11 SHM | Pixman | XShmPutImage (direct) | illumos workstation | | **Local fbdev** | illumos fbio / Linux fbdev | Pixman | Non-temporal memcpy to `/dev/fb0` | Bare-metal workstation (UEFI) |
| **Local Loopback** | Headless | Pixman | Shared memory | Same-machine optimization | | **Local X11** | X11 SHM | Pixman | XShmPutImage | Development / legacy BIOS / fallback |
| **Local Loopback** | Headless | Pixman | Shared memory | Co-located server+client |
| **Local DRM** | DRM/KMS | GLES | Direct scanout | Linux / accelerated GPU | | **Local DRM** | DRM/KMS | GLES | Direct scanout | Linux / accelerated GPU |
## Implementation: X11 SHM Backend ## Implementation: Framebuffer Backend (Tier 0)
```rust
struct FbdevBackend {
fd: RawFd,
buffer: *mut u8, // mmap'd framebuffer (write-combining)
shadow: Vec<u8>, // CPU-cached shadow buffer for rendering
width: u32,
height: u32,
depth: u32,
pitch: u32, // bytes per scanline
rgb_layout: RgbLayout, // mask/position from gfxfb_info
size: usize,
}
impl FbdevBackend {
fn open_illumos() -> Result<Self> {
let fd = open("/dev/fb0", O_RDWR)?;
// Verify identity
let ident = vis_getidentifier(fd)?;
assert_eq!(ident.name, "illumos_fb");
// Query geometry
let attr = fbiogattr(fd)?;
let gfxfb = gfxfb_info_from_dev_specific(&attr.sattr.dev_specific);
// mmap framebuffer
let buffer = mmap(fd, attr.fbtype.fb_size, PROT_READ | PROT_WRITE, MAP_SHARED)?;
// Take over from console
kdsetmode(fd, KD_GRAPHICS)?;
// ...
}
fn present_damage(&self, damage: &[Rectangle]) {
// For each damage rect: copy from shadow to FB
// using non-temporal stores for write-combining memory
for rect in damage {
streaming_copy_rect(&self.shadow, self.buffer, rect, self.pitch);
}
}
}
/// SIMD non-temporal copy (runtime-selected)
fn streaming_copy_rect(src: &[u8], dst: *mut u8, rect: &Rectangle, pitch: u32) {
// SSE2: _mm_stream_si128
// AVX2: _mm256_stream_si256
// AVX-512: _mm512_stream_si512
// Selected at runtime via std::is_x86_feature_detected!()
// (or getisax(2) on illumos)
}
```
On Linux, a similar struct uses `FBIOGET_VSCREENINFO` / `FBIOGET_FSCREENINFO` instead of `FBIOGATTR`.
Input on bare metal:
- illumos: read from `/dev/kbd` (keyboard) and `/dev/mouse` (mouse) via STREAMS ioctls
- Linux: libinput or raw evdev (`/dev/input/event*`)
## Implementation: X11 SHM Backend (Tier 1)
```rust ```rust
struct X11ShmBackend { struct X11ShmBackend {
@ -135,7 +257,7 @@ The backend integrates into calloop by registering the X11 connection fd as an e
## Rationale ## Rationale
- **X11 SHM is universal on illumos**: No GPU requirements, works with VESA - **`/dev/fb0` enables bare-metal on illumos**: No X11 dependency, WayRay as sole display server. Proven by xf86-video-illumosfb consuming the same `fbio(4I)` interface.
- **PixmanRenderer is fast enough**: For a desktop compositor on a workstation CPU, software compositing handles typical desktop loads well. Browsers and media do their own GPU rendering internally. - **PixmanRenderer is fast enough**: For a desktop compositor on a workstation CPU, software compositing handles typical desktop loads well. Browsers and media do their own GPU rendering internally.
- **Same compositor, different output**: The Smithay compositor core is identical in local and remote modes. Only the output backend changes. This avoids maintaining two compositor codepaths. - **Same compositor, different output**: The Smithay compositor core is identical in local and remote modes. Only the output backend changes. This avoids maintaining two compositor codepaths.
- **cocoa-way validates this**: Smithay on macOS works by rendering headless and presenting to a native window. Same pattern, different native window system. - **cocoa-way validates this**: Smithay on macOS works by rendering headless and presenting to a native window. Same pattern, different native window system.
@ -143,10 +265,14 @@ The backend integrates into calloop by registering the X11 connection fd as an e
## Consequences ## Consequences
- Must write and maintain a custom X11 SHM backend (not in upstream Smithay) - Must write and maintain two custom backends: fbdev and X11 SHM (neither in upstream Smithay)
- Fbdev backend requires SIMD non-temporal store implementation (Rust `std::arch` intrinsics)
- Fbdev resolution is fixed at boot; no mode switching. Users must configure UEFI GOP resolution in firmware settings.
- Fbdev has no VSync; tearing is possible. Mitigate with damage-based partial updates.
- Fbdev input on illumos needs custom `/dev/kbd` + `/dev/mouse` STREAMS reader
- X11 SHM backend adds `x11rb` as a dependency (already pure Rust, minimal) - X11 SHM backend adds `x11rb` as a dependency (already pure Rust, minimal)
- Performance ceiling on unaccelerated VESA: adequate for desktop, not for gaming
- Input mapping from X11 events to Smithay types requires careful keysym/keycode handling - Input mapping from X11 events to Smithay types requires careful keysym/keycode handling
- Fullscreen mode needs proper X11 EWMH hints (`_NET_WM_STATE_FULLSCREEN`) - Fullscreen mode needs proper X11 EWMH hints (`_NET_WM_STATE_FULLSCREEN`)
- Multi-monitor in fbdev mode requires multiple `/dev/fb*` devices or single large GOP framebuffer
- Multi-monitor in X11 SHM mode depends on Xorg's RANDR configuration - Multi-monitor in X11 SHM mode depends on Xorg's RANDR configuration
- On Linux, users will prefer the DRM backend; X11 SHM is primarily for illumos - On Linux, users will prefer the DRM backend; fbdev/X11 SHM are primarily for illumos