Add webfingerd design specification

Multi-tenant WebFinger server (RFC 7033) with ACME-style domain onboarding, scoped service token authorization, in-memory cache backed by SQLite, and server-rendered management UI.
2026-04-10 13:10:41 +00:00 · 2026-04-03 18:15:56 +02:00 · 2026-04-03 18:15:56 +02:00 · ed5ff2a796
commit ed5ff2a796
parent a526566a43
1 changed files with 363 additions and 0 deletions
--- a/docs/superpowers/specs/2026-04-03-webfingerd-design.md
+++ b/docs/superpowers/specs/2026-04-03-webfingerd-design.md
@ -0,0 +1,363 @@
+# webfingerd Design Specification
+
+**Date:** 2026-04-03
+**Status:** Approved
+
+## Overview
+
+webfingerd is a multi-tenant WebFinger server (RFC 7033) that centralizes WebFinger
+responses for multiple domains and services. Domain owners point their DNS to a
+webfingerd instance, then their backend services (e.g. barycenter for OIDC, oxifed for
+ActivityPub) register their links via a REST API. webfingerd responds to public
+WebFinger queries by assembling JRD responses from all registered links for the
+requested resource.
+
+### Problem
+
+Multiple services need WebFinger: barycenter needs it for OIDC issuer discovery, oxifed
+needs it for ActivityPub actor discovery. Each domain can only have one
+`/.well-known/webfinger` endpoint. Rather than embedding WebFinger in every service or
+using a reverse proxy to stitch responses together, a dedicated server aggregates links
+from all services under one endpoint.
+
+### Goals
+
+- Serve RFC 7033 compliant WebFinger and RFC 6415 host-meta responses
+- Support multiple domains and users from a single instance
+- Self-service domain onboarding with ACME-style ownership verification
+- Scoped authorization preventing services from registering foreign links
+- Fast query path via in-memory cache, durable storage via SQLite
+- Operational readiness: metrics, health checks, rate limiting, web UI
+
+## Architecture
+
+Single axum binary with modular internal components:
+
+- **WebFinger query handler** serves `/.well-known/webfinger` and `/.well-known/host-meta`
+- **REST management API** handles domain onboarding, token management, and link registration
+- **Auth middleware** validates tokens and enforces scope (allowed rels + resource patterns)
+- **In-memory cache** (DashMap keyed by resource URI) for O(1) query lookups
+- **Domain challenge engine** verifies domain ownership via DNS-01 or HTTP-01 challenges
+- **TTL reaper** (background tokio task) expires stale links
+- **SQLite via SeaORM** as the durable source of truth
+- **Prometheus metrics** and health check endpoints
+- **Server-rendered web UI** (askama templates) for domain owner management
+
+The query path reads exclusively from the in-memory cache. The write path goes through
+SQLite first, then updates the cache (write-through). On startup, all non-expired links
+are loaded from SQLite into the cache.
+
+## Data Model
+
+### domains
+
+| Column            | Type     | Notes                          |
+|-------------------|----------|--------------------------------|
+| id                | TEXT PK  | UUID                           |
+| domain            | TEXT     | UNIQUE, e.g. alice.example     |
+| owner_token_hash  | TEXT     | argon2 hash                    |
+| challenge_type    | TEXT     | dns-01 or http-01              |
+| challenge_token   | TEXT     | nullable, pending challenge    |
+| verified          | BOOL     |                                |
+| created_at        | DATETIME |                                |
+| verified_at       | DATETIME |                                |
+
+### service_tokens
+
+| Column           | Type     | Notes                                |
+|------------------|----------|--------------------------------------|
+| id               | TEXT PK  | UUID                                 |
+| domain_id        | TEXT FK  | references domains.id                |
+| name             | TEXT     | human label, e.g. oxifed             |
+| token_hash       | TEXT     | argon2 hash                          |
+| allowed_rels     | TEXT     | JSON array of rel strings            |
+| resource_pattern | TEXT     | glob, e.g. acct:*@social.alice.example |
+| created_at       | DATETIME |                                      |
+| revoked_at       | DATETIME | nullable                             |
+
+### links
+
+| Column           | Type     | Notes                                |
+|------------------|----------|--------------------------------------|
+| id               | TEXT PK  | UUID                                 |
+| service_token_id | TEXT FK  | references service_tokens.id         |
+| domain_id        | TEXT FK  | references domains.id                |
+| resource_uri     | TEXT     | e.g. acct:alice@alice.example        |
+| rel              | TEXT     |                                      |
+| href             | TEXT     | nullable                             |
+| type             | TEXT     | nullable, media type                 |
+| titles           | TEXT     | nullable, JSON object                |
+| properties       | TEXT     | nullable, JSON object                |
+| template         | TEXT     | nullable, RFC 6570 URI template      |
+| ttl_seconds      | INTEGER  | nullable, NULL means permanent       |
+| created_at       | DATETIME |                                      |
+| expires_at       | DATETIME | nullable, computed from ttl          |
+
+### Relationships
+
+- domains 1:N service_tokens
+- domains 1:N links
+- service_tokens 1:N links
+
+### Key Decisions
+
+- **resource_pattern** uses glob matching. `acct:*@alice.example` means any user at the
+  domain. Domain owners can restrict further, e.g. `acct:blog-*@alice.example`.
+- **allowed_rels** is a JSON array. On registration, webfingerd validates the incoming
+  link's rel is in this list.
+- **links** stores individual link objects, not full JRD responses. At query time,
+  webfingerd assembles the JRD from all links matching the resource (and optional rel
+  filter).
+- **ttl_seconds** nullable. NULL means permanent. When set, expires_at is computed as
+  created_at + ttl_seconds. The reaper cleans expired entries.
+- Token hashes use argon2. Plaintext tokens are never stored.
+
+## Authorization Flow
+
+### Phase 1: Domain Onboarding (self-service)
+
+1. Domain owner calls `POST /api/v1/domains` with their domain name and preferred
+   challenge type (dns-01 or http-01).
+2. webfingerd generates a challenge token and returns instructions:
+   - **dns-01**: create a TXT record at `_webfinger-challenge.{domain}` with the token
+   - **http-01**: serve the token at `https://{domain}/.well-known/webfinger-verify/{token}`
+3. Domain owner provisions the challenge.
+4. Domain owner calls `POST /api/v1/domains/{id}/verify`.
+5. webfingerd verifies the challenge (DNS lookup or HTTP GET).
+6. On success, returns a domain owner token. This token is shown once and stored only
+   as an argon2 hash.
+
+Challenge tokens expire after a configurable TTL (default 1 hour).
+
+### Phase 2: Service Token Creation
+
+1. Domain owner calls `POST /api/v1/domains/{id}/tokens` (authenticated with owner
+   token), specifying:
+   - `name`: human label (e.g. "oxifed")
+   - `allowed_rels`: list of rel types this service can register
+   - `resource_pattern`: glob pattern restricting which resources this service can write
+2. webfingerd creates the service token, returns it once, stores only the hash.
+
+### Phase 3: Link Registration
+
+1. Service calls `POST /api/v1/links` (authenticated with service token) with link
+   data: resource_uri, rel, href, type, ttl, etc.
+2. webfingerd validates:
+   - The link's `rel` is in the token's `allowed_rels`
+   - The link's `resource_uri` matches the token's `resource_pattern`
+   - The token's domain is verified
+3. On success, writes to SQLite and updates the in-memory cache.
+
+### Scope Enforcement Rules
+
+- A service token can only create/update/delete links where the rel is in allowed_rels
+  AND the resource_uri matches the resource_pattern AND the domain is verified.
+- A domain owner token can only manage service tokens for its own verified domain.
+- Tokens are shown once at creation. Only the hash is stored.
+
+## In-Memory Cache
+
+### Structure
+
+A `DashMap<String, Vec<Link>>` keyed by `resource_uri`. DashMap provides concurrent
+lock-free reads suitable for the high-read, low-write webfinger query pattern.
+
+### Cache Operations
+
+- **Startup hydration**: load all non-expired links from SQLite, group by resource_uri,
+  populate the DashMap.
+- **Write-through**: API writes go to SQLite first, then insert/update the affected
+  resource's entry in the cache.
+- **TTL reaper**: background tokio task runs every ~30 seconds, queries for
+  `expires_at < now()`, deletes from SQLite, evicts from cache.
+
+### Query Path
+
+1. Parse `resource` and optional `rel` parameters from the request.
+2. Look up `resource_uri` in the DashMap. Return 404 if not found.
+3. If `rel` parameters are present, filter the Vec<Link> to matching rels.
+4. Assemble JRD response (subject, aliases, links array).
+5. Return `application/jrd+json` with CORS headers (`Access-Control-Allow-Origin: *`).
+
+### host-meta
+
+`GET /.well-known/host-meta` returns a static XRD document containing an LRDD template
+pointing to the webfinger endpoint. No cache interaction needed.
+
+## REST API
+
+### Domain Onboarding
+
+| Method | Path                          | Auth         | Description                |
+|--------|-------------------------------|--------------|----------------------------|
+| POST   | /api/v1/domains               | none         | Register domain, get challenge |
+| GET    | /api/v1/domains/{id}          | owner_token  | Get domain status          |
+| POST   | /api/v1/domains/{id}/verify   | none         | Submit for verification    |
+| DELETE | /api/v1/domains/{id}          | owner_token  | Remove domain + all links  |
+
+### Service Tokens
+
+| Method | Path                               | Auth         | Description         |
+|--------|-------------------------------------|-------------|---------------------|
+| POST   | /api/v1/domains/{id}/tokens         | owner_token | Create service token |
+| GET    | /api/v1/domains/{id}/tokens         | owner_token | List service tokens  |
+| DELETE | /api/v1/domains/{id}/tokens/{tid}   | owner_token | Revoke token         |
+
+### Links
+
+| Method | Path                    | Auth          | Description              |
+|--------|-------------------------|---------------|--------------------------|
+| POST   | /api/v1/links           | service_token | Register link(s)         |
+| GET    | /api/v1/links?resource= | service_token | List links for resource  |
+| PUT    | /api/v1/links/{lid}     | service_token | Update link              |
+| DELETE | /api/v1/links/{lid}     | service_token | Delete link              |
+| POST   | /api/v1/links/batch     | service_token | Bulk register/update     |
+
+### Public Endpoints
+
+| Method | Path                       | Auth | Description              |
+|--------|----------------------------|------|--------------------------|
+| GET    | /.well-known/webfinger     | none | RFC 7033 WebFinger query |
+| GET    | /.well-known/host-meta     | none | RFC 6415 host-meta       |
+
+### Operational
+
+| Method | Path      | Auth | Description        |
+|--------|-----------|------|--------------------|
+| GET    | /metrics  | none | Prometheus metrics |
+| GET    | /healthz  | none | Health check       |
+
+### Error Responses
+
+All management endpoints return standard errors: 400 (bad request), 401 (invalid
+token), 403 (scope violation), 404 (not found), 409 (conflict/duplicate), 429 (rate
+limited with Retry-After header).
+
+The public webfinger endpoint returns 404 for unknown resources per RFC 7033. It does
+not reveal which resources exist vs which domains are registered.
+
+### Batch Endpoint
+
+`POST /api/v1/links/batch` accepts an array of link objects. Services like oxifed
+registering many users at startup benefit from bulk registration rather than N individual
+calls. Maximum 500 links per batch (configurable).
+
+## Web UI
+
+A minimal server-rendered UI for domain owners to manage their domains and tokens.
+
+### Pages
+
+- **Login**: authenticate with owner token (paste token, receive session cookie)
+- **Dashboard**: list verified domains, pending challenges, link counts per domain
+- **Domain detail**: challenge instructions, verification status, service token list
+- **Token management**: create/revoke service tokens, view allowed rels and resource patterns
+- **Link browser**: read-only view of all links under a domain, filterable by resource/rel/service
+
+### Implementation
+
+- Server-side rendered with askama templates (compile-time type-safe, zero overhead)
+- Minimal CSS, no JavaScript framework, progressive enhancement where needed
+- Served under `/ui/*` from the same axum binary
+- Session managed via signed cookies (axum-extra)
+- Auth: owner token as credential, no separate username/password system
+
+### Not in Scope
+
+- No user registration/signup flow. Domain owners get their token from the API.
+- No service link editing from the UI. Services manage their own links via API.
+- No multi-user access per domain (one owner token per domain).
+
+## Rate Limiting
+
+Implemented as axum middleware using a token bucket algorithm (governor crate).
+
+### Tiers
+
+| Tier                 | Limit           | Scope    |
+|----------------------|-----------------|----------|
+| Public webfinger     | 60 req/min      | per IP   |
+| Management API       | 300 req/min     | per token |
+| Batch endpoint       | 10 req/min      | per token |
+
+Returns 429 Too Many Requests with Retry-After header.
+
+## Observability
+
+### Prometheus Metrics
+
+- `webfinger_queries_total{domain, status}` — query count by domain and HTTP status
+- `webfinger_query_duration_seconds` — histogram of query latency
+- `webfinger_links_total{domain}` — gauge of active links per domain
+- `webfinger_domains_total{verified}` — gauge of registered domains
+- `webfinger_cache_hits_total` / `webfinger_cache_misses_total`
+- `webfinger_links_expired_total` — counter of TTL-reaped links
+- `webfinger_challenge_verifications_total{type, result}` — DNS/HTTP challenge outcomes
+
+### Health Check
+
+- `GET /healthz` returns 200 if SQLite is reachable and cache is initialized.
+- Returns 503 during startup hydration.
+
+### Logging
+
+- tracing crate with structured JSON output
+- Request IDs propagated through all layers
+
+## Configuration
+
+Single TOML file with env var overrides (12-factor). Every key is overridable via env
+using `__` as separator (e.g. `WEBFINGERD_SERVER__LISTEN`), powered by the config crate.
+
+```toml
+[server]
+listen = "0.0.0.0:8080"
+base_url = "https://webfinger.example.com"
+
+[database]
+path = "/var/lib/webfingerd/webfingerd.db"
+
+[cache]
+reaper_interval_secs = 30
+
+[rate_limit]
+public_rpm = 60
+api_rpm = 300
+batch_rpm = 10
+batch_max_links = 500
+
+[challenge]
+dns_txt_prefix = "_webfinger-challenge"
+http_well_known_path = ".well-known/webfinger-verify"
+challenge_ttl_secs = 3600
+
+[ui]
+enabled = true
+session_secret = "override-via-env"
+```
+
+## Deployment
+
+- Single static binary (musl target for portability)
+- SQLite file on a persistent volume
+- Reverse proxy (nginx/caddy) terminates TLS and forwards to webfingerd
+- User points their domain's DNS to the reverse proxy (A/CNAME record)
+- Multiple domains can point to the same instance. webfingerd resolves the correct
+  links based on the resource parameter, not the Host header.
+
+## Crate Dependencies
+
+| Crate                       | Purpose                           |
+|-----------------------------|-----------------------------------|
+| axum, tokio                 | HTTP server, async runtime        |
+| sea-orm, sea-orm-migration  | ORM, database migrations          |
+| dashmap                     | Concurrent in-memory cache        |
+| governor                    | Rate limiting (token bucket)      |
+| askama                      | Server-side HTML templates        |
+| argon2                      | Token hashing                     |
+| config, serde               | Configuration loading             |
+| tracing, tracing-subscriber | Structured logging                |
+| metrics, metrics-exporter-prometheus | Prometheus metrics export |
+| hickory-resolver            | DNS challenge verification        |
+| reqwest                     | HTTP challenge verification       |
+| glob-match                  | Resource pattern matching         |