akh-medu: cleanup_by_pattern — bulk-remove triples matching predicate/entity patterns #134

Closed
opened 2026-05-22 13:32:56 +00:00 by toasterson · 0 comments
Owner

Problem

There is no safe way to remove misclassified triples or symbols from the knowledge graph at scale. The only write tools are assert_triple (add) and remove_triple (remove one). cleanup_by_pattern is filed as a feature request but not implemented.

This means:

  • ~90% noise contamination (~29K symbols) cannot be cleaned
  • Surgical caretaking cron documents contamination daily but cannot fix it
  • Agent OODA cycle artifacts (episodes, summaries, goal fragments) accumulate forever

Cross-reference

This is the companion to #131 (ingest_text deprecation). Once we stop the bleeding, we need to clean up the 32K contaminated symbols already in the workspace.

Also covers akh-medu repo issues #104, #94, #80, #99 — all duplicates of the same need.

Proposed Fix

In akhomed, add a new MCP tool:

  • cleanup_by_pattern(subject_pattern, predicate_pattern, object_pattern, dry_run)
  • All patterns support glob: "episode:", "summary:Cycle", "goal:Learn*"
  • dry_run returns count + sample without deleting
  • Uses batched deletes to avoid lock contention
  • Reports (matched_count, deleted_count, error_count)

Also in anima-server: expose this as an MCP tool in the akh-medu client.

Acceptance

  • cleanup_by_pattern("episode:", "", "*", dry_run=true) returns count > 0 on tecton workspace
  • cleanup_by_pattern("episode:", "", "*", dry_run=false) removes all matching triples
  • No SIGKILL on workspaces under 50K symbols
  • Surgical caretaking cron uses this instead of failing silently

Complexity: M

New daemon-side tool + anima-server MCP wrapper. Schema is straightforward (glob matching, batched deletes). 1-2 days.

## Problem There is no safe way to remove misclassified triples or symbols from the knowledge graph at scale. The only write tools are assert_triple (add) and remove_triple (remove one). cleanup_by_pattern is filed as a feature request but not implemented. This means: - ~90% noise contamination (~29K symbols) cannot be cleaned - Surgical caretaking cron documents contamination daily but cannot fix it - Agent OODA cycle artifacts (episodes, summaries, goal fragments) accumulate forever ## Cross-reference This is the companion to #131 (ingest_text deprecation). Once we stop the bleeding, we need to clean up the 32K contaminated symbols already in the workspace. Also covers akh-medu repo issues #104, #94, #80, #99 — all duplicates of the same need. ## Proposed Fix In akhomed, add a new MCP tool: - cleanup_by_pattern(subject_pattern, predicate_pattern, object_pattern, dry_run) - All patterns support glob: "episode:*", "summary:Cycle*", "goal:Learn*" - dry_run returns count + sample without deleting - Uses batched deletes to avoid lock contention - Reports (matched_count, deleted_count, error_count) Also in anima-server: expose this as an MCP tool in the akh-medu client. ## Acceptance - cleanup_by_pattern("episode:*", "*", "*", dry_run=true) returns count > 0 on tecton workspace - cleanup_by_pattern("episode:*", "*", "*", dry_run=false) removes all matching triples - No SIGKILL on workspaces under 50K symbols - Surgical caretaking cron uses this instead of failing silently ## Complexity: M New daemon-side tool + anima-server MCP wrapper. Schema is straightforward (glob matching, batched deletes). 1-2 days.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
toasterson/Anima#134
No description provided.