akh-medu: deprecate ingest_text bridge — primary KG contamination vector #131

Closed
opened 2026-05-22 13:31:45 +00:00 by toasterson · 0 comments
Owner

Problem

The ingest_text MCP tool on akhomed is the primary vector for knowledge graph contamination. Every call feeds unstructured text into the NLU pipeline which creates low-quality, misattributed triples that bleed across workspace boundaries.

MEMORY.md explicitly warns "NEVER use ingest_text unless explicitly told" — yet the surgical caretaking cron has been forced to use it as a fallback because assert_batch is broken.

Root causes:

  1. ingest_text is lossy. Splits compound facts into fragments, misattributes them, creates garbage literal strings as symbols.
  2. No rate limit or quota. A single agent cycle can call it dozens of times.
  3. Cross-workspace bleed. Text ingested against "default" propagates to "tecton" through VSA similarity.

Data: 780 symbols on May 15 → 32,725 on May 22 (42x growth). ~90% noise.

Proposed Fix (this WI)

In crates/anima-server/src/clients/akhmedu.rs:

  • Add config flag akhmedu.ingest_text_enabled defaulting to false
  • Gate the ingest_text MCP handler on this flag
  • Log prominent warning when called, including caller identity

Acceptance

  • Setting akhmedu.ingest_text_enabled=false (default) rejects all ingest_text calls
  • Surgical caretaking reports show zero ingest_text calls after deployment

Complexity: S

Single-file config flag + gate. 20-40 lines changed.

## Problem The ingest_text MCP tool on akhomed is the primary vector for knowledge graph contamination. Every call feeds unstructured text into the NLU pipeline which creates low-quality, misattributed triples that bleed across workspace boundaries. **MEMORY.md explicitly warns "NEVER use ingest_text unless explicitly told"** — yet the surgical caretaking cron has been forced to use it as a fallback because assert_batch is broken. Root causes: 1. **ingest_text is lossy.** Splits compound facts into fragments, misattributes them, creates garbage literal strings as symbols. 2. **No rate limit or quota.** A single agent cycle can call it dozens of times. 3. **Cross-workspace bleed.** Text ingested against "default" propagates to "tecton" through VSA similarity. Data: 780 symbols on May 15 → 32,725 on May 22 (42x growth). ~90% noise. ## Proposed Fix (this WI) In crates/anima-server/src/clients/akhmedu.rs: - Add config flag `akhmedu.ingest_text_enabled` defaulting to `false` - Gate the ingest_text MCP handler on this flag - Log prominent warning when called, including caller identity ## Acceptance - Setting akhmedu.ingest_text_enabled=false (default) rejects all ingest_text calls - Surgical caretaking reports show zero ingest_text calls after deployment ## Complexity: S Single-file config flag + gate. 20-40 lines changed.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
toasterson/Anima#131
No description provided.