wayray/docs/ai/adr/011-local-display-mode.md
Till Wegmueller f005dccd67
Update ADR-011: Add bare-metal framebuffer backend (Tier 0)
illumos has /dev/fb0 via the gfxp_bitmap driver on UEFI GOP systems,
exposing the classic SunOS fbio(4I) interface. Userspace can mmap the
framebuffer and write pixels directly -- proven by xf86-video-illumosfb.

New four-tier architecture:
- Tier 0: Bare-metal /dev/fb0 (illumos fbio + Linux fbdev). No X11.
- Tier 1: X11 SHM (portable fallback, also dev mode)
- Tier 2: Loopback shared memory (co-located optimization)
- Tier 3: DRM/KMS (Linux, rare illumos)

Includes implementation sketch with SIMD non-temporal stores for
write-combining memory (SSE2/AVX2/AVX-512 runtime selection).
2026-03-28 22:43:40 +01:00

12 KiB

ADR-011: Local Display Mode

Status

Accepted

Context

WayRay is designed as a remote thin client compositor, but as OpenIndiana's flagship Wayland experience it must also work as a local desktop compositor -- apps and display on the same machine. The "but can I just use it locally?" question is day-one feedback.

The illumos Display Problem

Getting pixels on screen on illumos is constrained:

Path Status on illumos
DRM/KMS Intel Gen2-7 only (ancient). No AMD, no modern Intel.
illumos /dev/fb0 (fbio) Works. UEFI GOP framebuffer via gfxp_bitmap driver. Userspace mmap + write pixels directly. Resolution fixed at boot.
illumos VIS console ops Kernel-only (VIS_CONSDISPLAY etc. check FKIOCTL). Not for userspace rendering.
X11 + illumosfb DDX Works on UEFI GOP systems. Uses /dev/fb0 directly. See xf86-video-illumosfb.
X11 + VESA DDX Works on any GPU via VBE BIOS calls. CPU-rendered.
X11 + i915 DDX Works on Intel Gen2-7 with DRI acceleration.
X11 + NVIDIA proprietary Works with specific driver versions.
Mesa llvmpipe Software OpenGL available everywhere.

Two universal display paths exist on illumos:

  1. /dev/fb0 bare-metal -- direct framebuffer access on UEFI GOP systems (no X11 needed)
  2. X11 -- works everywhere including legacy BIOS via VESA DDX

illumos /dev/fb0 Details

The gfxp_bitmap kernel driver (backing the vgatext DDI driver) exposes the UEFI GOP framebuffer via the classic SunOS fbio(4I) interface:

fd = open("/dev/fb0", O_RDWR);
ioctl(fd, VIS_GETIDENTIFIER, &ident);   // -> "illumos_fb"
ioctl(fd, FBIOGATTR, &attr);            // -> struct fbgattr with:
                                         //    resolution, depth, size
                                         //    pitch + RGB masks via gfxfb_info
                                         //    in sattr.dev_specific[]
buf = mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
ioctl(fd, KDSETMODE, KD_GRAPHICS);      // take over from console
// write pixels directly to buf (write-combining memory)
// use non-temporal stores (SSE2/AVX2/AVX-512) for performance
ioctl(fd, KDSETMODE, KD_TEXT);           // release back to console

Constraints:

  • UEFI GOP only (legacy BIOS falls back to VGA text mode)
  • Resolution fixed at boot time (no mode switching -- whatever GOP configured)
  • Write-combining memory mapping requires non-temporal stores for performance
  • struct gfxfb_info (pitch, RGB layout) smuggled through fbsattr.dev_specific[8]

Source references:

  • illumos-gate/usr/src/uts/common/sys/fbio.h -- ioctl definitions, gfxfb_info
  • illumos-gate/usr/src/uts/i86pc/io/gfx_private/gfxp_bitmap.c -- bitmap FB backend
  • illumos-gate/usr/src/uts/i86pc/io/gfx_private/gfxp_fb.c -- ioctl dispatch
  • illumos-gate/usr/src/uts/intel/io/vgatext/vgatext.c -- DDI driver creating /dev/fb0

Smithay Backend Landscape

Backend Requirements Works on illumos?
backend_drm DRM/KMS + GBM + libseat Only Intel Gen2-7
backend_x11 X11 + DRM node + GBM Only with DRM (rare)
backend_winit winit + EGL Needs winit illumos patches + Mesa
Custom fbio /dev/fb0 + UEFI GOP Yes -- bare metal, no X11 needed
Custom X11 SHM X11 + MIT-SHM extension Yes -- universal fallback

Decision

Four-tier local display architecture:

Tier 0: Bare-Metal Framebuffer Backend (illumos /dev/fb0, Linux /dev/fb0)

Direct framebuffer access -- WayRay as the sole display server, no X11 underneath.

On illumos (UEFI GOP systems):

  1. Open /dev/fb0, verify VIS_GETIDENTIFIER returns "illumos_fb"
  2. Query geometry via FBIOGATTR (resolution, depth, pitch, RGB layout from gfxfb_info)
  3. mmap() the framebuffer (write-combining memory)
  4. KDSETMODEKD_GRAPHICS to take over from console
  5. Render with PixmanRenderer into CPU buffer
  6. Copy damaged regions to framebuffer using non-temporal stores (SSE2/AVX2/AVX-512)
  7. Input from /dev/kbd + /dev/mouse (illumos STREAMS input devices)

On Linux:

  1. Open /dev/fb0, query via FBIOGET_VSCREENINFO / FBIOGET_FSCREENINFO
  2. mmap() the framebuffer
  3. Render and blit same as above
  4. Input via libinput or evdev
Wayland apps → Smithay compositor → PixmanRenderer → CPU buffer
    → non-temporal memcpy to /dev/fb0 → pixels on screen

Constraints: Resolution fixed at boot (UEFI GOP / VESA BIOS). No VSync (tearing possible). No hardware acceleration. Requires UEFI on illumos.

Performance note: xf86-video-illumosfb demonstrates that SIMD non-temporal stores are essential for write-combining memory. The backend must use _mm_stream_si128 (SSE2), _mm256_stream_si256 (AVX2), or _mm512_stream_si512 (AVX-512) with runtime detection via getisax(2) on illumos or CPUID on Linux. Rust's std::arch intrinsics provide these.

Tier 1: Custom X11 SHM Backend (Portable Fallback)

A custom Smithay backend that:

  1. Opens an X11 connection via x11rb (pure Rust XCB bindings)
  2. Creates a window (fullscreen or windowed for development)
  3. Renders with PixmanRenderer into CPU buffers
  4. Presents via XShmPutImage (MIT-SHM extension) -- zero DRM/EGL/GBM dependency
  5. Receives keyboard/mouse input from X11 events
  6. Maps X11 input events to Smithay's input types

This works on every illumos system with X11, regardless of GPU or BIOS type. Even xf86-video-vesa works. Also useful for development (run WayRay in a window on your existing desktop).

Wayland apps → Smithay compositor → PixmanRenderer → CPU buffer
    → XShmPutImage → X11 window on screen

Tier 2: Loopback Optimization (Local Server+Client)

When wayray-server and wayray-client run on the same machine, skip encoding entirely:

  1. Server renders to shared memory ring buffer (shm_open + mmap)
  2. Client reads framebuffers directly from shared memory
  3. Only damage regions communicated via small control channel
  4. Client presents via Tier 0 (fbdev) or Tier 1 (X11 SHM)
Wayland apps → Smithay compositor → PixmanRenderer → shared memory
    → wayray-client (local) → fbdev or X11 SHM → screen

Performance: sub-millisecond frame latency (vs 5-30ms with encode/decode), near-zero CPU overhead for transport, pixel-perfect quality.

Tier 3: DRM Backend (Linux, Accelerated illumos)

On Linux or illumos with a supported DRM GPU (Intel Gen2-7):

  • Use Smithay's standard backend_drm + GlesRenderer
  • Direct scanout, hardware compositing, VSync
  • Feature-gated: local-drm

Backend Selection Logic

if cfg!(feature = "local-drm") && drm_device_available() {
    // Tier 3: Direct DRM/KMS (best performance)
    use DrmBackend + GlesRenderer
} else if local_mode && fbdev_available() {
    // Tier 0: Bare-metal framebuffer (no X11 needed)
    use FbdevBackend + PixmanRenderer
} else if local_mode && x11_available() {
    // Tier 1: X11 SHM (fallback, also good for development)
    use X11ShmBackend + PixmanRenderer
} else {
    // Remote mode (default)
    use HeadlessBackend + PixmanRenderer + QUIC transport
}

Mode Summary

Mode Backend Renderer Transport Use Case
Remote Headless Pixman/GLES QUIC (encode+decode) Thin client (primary)
Local fbdev illumos fbio / Linux fbdev Pixman Non-temporal memcpy to /dev/fb0 Bare-metal workstation (UEFI)
Local X11 X11 SHM Pixman XShmPutImage Development / legacy BIOS / fallback
Local Loopback Headless Pixman Shared memory Co-located server+client
Local DRM DRM/KMS GLES Direct scanout Linux / accelerated GPU

Implementation: Framebuffer Backend (Tier 0)

struct FbdevBackend {
    fd: RawFd,
    buffer: *mut u8,           // mmap'd framebuffer (write-combining)
    shadow: Vec<u8>,           // CPU-cached shadow buffer for rendering
    width: u32,
    height: u32,
    depth: u32,
    pitch: u32,                // bytes per scanline
    rgb_layout: RgbLayout,     // mask/position from gfxfb_info
    size: usize,
}

impl FbdevBackend {
    fn open_illumos() -> Result<Self> {
        let fd = open("/dev/fb0", O_RDWR)?;
        // Verify identity
        let ident = vis_getidentifier(fd)?;
        assert_eq!(ident.name, "illumos_fb");
        // Query geometry
        let attr = fbiogattr(fd)?;
        let gfxfb = gfxfb_info_from_dev_specific(&attr.sattr.dev_specific);
        // mmap framebuffer
        let buffer = mmap(fd, attr.fbtype.fb_size, PROT_READ | PROT_WRITE, MAP_SHARED)?;
        // Take over from console
        kdsetmode(fd, KD_GRAPHICS)?;
        // ...
    }

    fn present_damage(&self, damage: &[Rectangle]) {
        // For each damage rect: copy from shadow to FB
        // using non-temporal stores for write-combining memory
        for rect in damage {
            streaming_copy_rect(&self.shadow, self.buffer, rect, self.pitch);
        }
    }
}

/// SIMD non-temporal copy (runtime-selected)
fn streaming_copy_rect(src: &[u8], dst: *mut u8, rect: &Rectangle, pitch: u32) {
    // SSE2:   _mm_stream_si128
    // AVX2:   _mm256_stream_si256
    // AVX-512: _mm512_stream_si512
    // Selected at runtime via std::is_x86_feature_detected!()
    // (or getisax(2) on illumos)
}

On Linux, a similar struct uses FBIOGET_VSCREENINFO / FBIOGET_FSCREENINFO instead of FBIOGATTR.

Input on bare metal:

  • illumos: read from /dev/kbd (keyboard) and /dev/mouse (mouse) via STREAMS ioctls
  • Linux: libinput or raw evdev (/dev/input/event*)

Implementation: X11 SHM Backend (Tier 1)

struct X11ShmBackend {
    conn: x11rb::rust_connection::RustConnection,
    window: x11rb::protocol::xproto::Window,
    shm_seg: x11rb::protocol::shm::Seg,
    buffer: *mut u8,  // mmap'd SHM region
    width: u32,
    height: u32,
    // Input state
    keyboard_state: xkb::State,
    pointer_position: (f64, f64),
}

impl X11ShmBackend {
    /// Present a rendered frame to the X11 window
    fn present(&self, damage: &[Rectangle]) {
        // XShmPutImage for each damage rectangle
        // (or full frame if damage covers most of screen)
    }

    /// Pump X11 events and convert to Smithay InputEvents
    fn process_input(&mut self) -> Vec<InputEvent> {
        // KeyPress/KeyRelease -> KeyboardKeyEvent
        // MotionNotify -> PointerMotionAbsoluteEvent
        // ButtonPress/ButtonRelease -> PointerButtonEvent
        // etc.
    }
}

The backend integrates into calloop by registering the X11 connection fd as an event source.

Rationale

  • /dev/fb0 enables bare-metal on illumos: No X11 dependency, WayRay as sole display server. Proven by xf86-video-illumosfb consuming the same fbio(4I) interface.
  • PixmanRenderer is fast enough: For a desktop compositor on a workstation CPU, software compositing handles typical desktop loads well. Browsers and media do their own GPU rendering internally.
  • Same compositor, different output: The Smithay compositor core is identical in local and remote modes. Only the output backend changes. This avoids maintaining two compositor codepaths.
  • cocoa-way validates this: Smithay on macOS works by rendering headless and presenting to a native window. Same pattern, different native window system.
  • Loopback optimization is easy: Once local X11 works, adding shared-memory passthrough for co-located server+client is incremental

Consequences

  • Must write and maintain two custom backends: fbdev and X11 SHM (neither in upstream Smithay)
  • Fbdev backend requires SIMD non-temporal store implementation (Rust std::arch intrinsics)
  • Fbdev resolution is fixed at boot; no mode switching. Users must configure UEFI GOP resolution in firmware settings.
  • Fbdev has no VSync; tearing is possible. Mitigate with damage-based partial updates.
  • Fbdev input on illumos needs custom /dev/kbd + /dev/mouse STREAMS reader
  • X11 SHM backend adds x11rb as a dependency (already pure Rust, minimal)
  • Input mapping from X11 events to Smithay types requires careful keysym/keycode handling
  • Fullscreen mode needs proper X11 EWMH hints (_NET_WM_STATE_FULLSCREEN)
  • Multi-monitor in fbdev mode requires multiple /dev/fb* devices or single large GOP framebuffer
  • Multi-monitor in X11 SHM mode depends on Xorg's RANDR configuration
  • On Linux, users will prefer the DRM backend; fbdev/X11 SHM are primarily for illumos