diff --git a/.idea/vcs.xml b/.idea/vcs.xml
index d08d981..94a25f7 100644
--- a/.idea/vcs.xml
+++ b/.idea/vcs.xml
@@ -2,6 +2,5 @@
-
\ No newline at end of file
diff --git a/doc/pkg5_docs/search.txt b/doc/pkg5_docs/search.txt
index 2fe2e31..b2331f6 100644
--- a/doc/pkg5_docs/search.txt
+++ b/doc/pkg5_docs/search.txt
@@ -72,3 +72,311 @@ SEARCH
the indexes from are a consistent set (have identical version
numbers). consistent_open in search_storage takes care of this
functionality.
+
+ 3.2 Implementation overview (how the index is built and updated)
+
+ The implementation of the search index lives primarily in
+ src/modules/indexer.py and src/modules/search_storage.py.
+
+ At a high level, updates are handled by Indexer._generic_update_index(),
+ which performs these steps:
+
+ 1) Obtain an exclusive lock on $ROOT/index/lock to serialize writers.
+ 2) Read the existing index files using a version-checked, consistent
+ open (see search_storage.consistent_open()) so readers always see a
+ matched set of files with the same VERSION number.
+ 3) Build new index artifacts in a temporary directory
+ $ROOT/index/TMP/ without disturbing the live files.
+ 4) Write out all helper dictionaries; then migrate (rename/move)
+ files from TMP into $ROOT/index, updating the version number.
+ This migration is not atomic across multiple files, so readers use
+ version checks and retry to ensure consistency.
+ 5) Release the lock and remove the temporary directory.
+
+ Two update paths are used depending on what changed:
+
+ - Fast update (incremental): For client-side installs/removals below a
+ small threshold (see MAX_FAST_INDEXED_PKGS in indexer.py), the
+ update only appends to the fast_add and fast_remove logs and updates
+ the set of full fmris. The large main dictionary files are not
+ rewritten or moved during this path.
+
+ - Full rebuild: When indexing on a repository (server side), or when
+ the number of changes exceeds the fast-update threshold, or when
+ an inconsistency/error is detected, the index is rebuilt by parsing
+ manifests and regenerating all dictionaries. The new files are then
+ migrated into place.
+
+ Temporary files and migration:
+
+ - All new/updated files are written under $ROOT/index/TMP/ with the
+ target VERSION header already set. After successful creation, the
+ files are moved into $ROOT/index. Legacy auxiliary directories are
+ cleaned up as needed during migration.
+
+ 3.3 When indexing is triggered (client and server)
+
+ Server side (repository):
+
+ - The depot hooks indexing into the publish operation. Each time a
+ package is published to the repository, the indexer is invoked with
+ Indexer.server_update_index(). If another indexing run is already
+ in progress, new fmris are queued and a subsequent run processes
+ them. The helper function Indexer.check_for_updates(index_root, cat)
+ can be used to discover catalog entries that have not yet been
+ indexed (e.g., across restarts).
+
+ Client side (images):
+
+ - The client integrates indexing with image modification operations:
+ install, update/image-update, uninstall, and any execution of an
+ image plan that changes packages. After successful execution of an
+ image plan (see src/modules/client/imageplan.py), the code calls
+ Indexer.client_update_index() to record the changes. On a brand-new
+ image (empty index directory), Indexer.setup() seeds empty stubs.
+
+ - If a fast incremental update is possible (few package changes), the
+ operation only updates the fast logs. If not (too many changes), the
+ client releases the lock and triggers a full rebuild via
+ Indexer.rebuild_index_from_scratch(image.gen_installed_pkgs()).
+
+ - If an inconsistency or unexpected error occurs during incremental
+ update, the client falls back to a full rebuild to restore a clean
+ and consistent index.
+
+ 3.4 Fast vs. full rebuild criteria
+
+ - Fast updates are used when the number of packages added/removed since
+ the last rebuild is small. The current threshold is defined by
+ MAX_FAST_INDEXED_PKGS (20 at the time of writing) in indexer.py.
+ During a fast update, only these files are changed/migrated:
+ * fast_add.v1
+ * fast_remove.v1
+ * full_fmri_list (and its hash)
+ The large main dictionary and token offset files are left untouched.
+
+ - A full rebuild is performed when:
+ * The change count exceeds the threshold, or
+ * The index on disk is missing or inconsistent, or
+ * An error occurs during fast update, or
+ * On server-side bulk operations (e.g., first-time indexing).
+
+ 3.5 Files and on-disk layout
+
+ All files reside under $ROOT/index. Important files include
+ (see src/modules/search_storage.py for authoritative names):
+
+ - main_dict.ascii.v2
+ The main inverted index mapping search tokens to postings. It is
+ written in sorted token blocks and may be split/merged during
+ rebuild; very large and thus avoided in fast updates.
+
+ - token_byte_offset.v1
+ A map of token to byte offsets into main_dict for efficient
+ random access by the query engine.
+
+ - fast_add.v1 / fast_remove.v1
+ Incremental logs holding fmris added/removed since the last full
+ rebuild. Used to answer queries without rebuilding immediately.
+
+ - full_fmri_list and full_fmri_list.hash
+ A list (and content hash) of all fmris currently represented in
+ the index. Used to detect divergence and for consistency checks.
+
+ - fmri_offsets.v1
+ An auxiliary mapping between fmri identifiers and positions used
+ when assembling postings; replaces legacy per-pkg files.
+
+ - manf_list.v1
+ A mapping table of internal manifest IDs to their fmri strings.
+
+ - lock
+ The writer lock file used to serialize index modifications.
+
+ During indexing, new versions of the above are created in
+ $ROOT/index/TMP with a new VERSION header and then moved into
+ $ROOT/index. Readers always verify that all open files have identical
+ VERSION headers and will reopen/retry if a migration is in progress.
+
+ 3.6 On-disk file formats (authoritative specification)
+
+ This section describes the exact line formats, encodings, and
+ invariants for the files under $ROOT/index as implemented by
+ src/modules/search_storage.py and used by src/modules/indexer.py.
+
+ Unless otherwise noted, every index file begins with a first line of
+ the form:
+
+ VERSION: \n
+ All subsequent lines are specific to each file’s purpose, and may be
+ in arbitrary order unless ordering is explicitly stated. Readers always
+ validate that all opened files share the same VERSION number.
+
+ Conventions used below:
+ - “token” means a search term after tokenization.
+ - “fmri” means a package FMRI string; for storage the scheme is
+ omitted (include_scheme=False) and anarchy=True to avoid
+ normalization changes.
+ - “byte offset” means an integer offset used by the query engine to
+ seek quickly within a larger file or posting list.
+
+ 3.6.1 main_dict.ascii.v2 — main inverted index
+
+ Purpose
+ Maps each token to its postings grouped by action type, key subtype,
+ and full value. Very large; written during full rebuilds only.
+
+ Encoding
+ One line per token. Each line has five kinds of separators in the
+ following precedence: space, '!', '@', '#', ','. The token itself is
+ URL-quoted (urllib.parse.quote) to ensure line safety.
+
+ Grammar (informal):
+
+ := '\n'
+ := |
+ := '!'
+ := | '@'
+ := '#'
+ := | '#'
+ :=
+ := | '#'
+ := ',' [',' ]*
+
+ Where and are URL-quoted; numeric fields are
+ base-10 integers. The first number after '#' in a PF pair is the
+ integer id of the FMRI (line number in manf_list.v1), followed by
+ one or more manifest byte offsets where the token matched.
+
+ Example
+
+ %25gconf.xml file!basename@basename#579,13249,13692,77391,77628
+
+ Meaning: the token "%gconf.xml" (quoted to %25gconf.xml) appears in
+ action type "file", key subtype "basename", with full_value
+ "basename" in manifest id 579 at offsets 13249, 13692, 77391, 77628.
+
+ 3.6.2 token_byte_offset.v1 — token → byte offset map
+
+ Purpose
+ Provides random access into large posting structures for a token.
+ Written during full rebuilds; unchanged during fast updates.
+
+ Encoding
+ After the VERSION line, each subsequent line maps a token to a
+ byte offset:
+
+ ' ' '\n'
+
+ Where is one of:
+ - '0' + tok when tok contains no spaces
+ - '1' + quote(tok) when tok contains spaces (URL-quoted)
+
+ On read, '1' indicates the token must be URL-unquoted. Offsets are
+ base-10 integers. Example:
+
+ 0libc 123456
+ 1hello%20world 98765
+
+ 3.6.3 fast_add.v1 and fast_remove.v1 — incremental update logs
+
+ Purpose
+ Record installed/uninstalled package FMRIs since the last full
+ rebuild. Used to answer queries incrementally without rewriting the
+ main dictionaries. Updated by fast client updates only.
+
+ Encoding
+ After the VERSION line, each non-empty line contains a single FMRI
+ string (anarchy=True, include_scheme=False). Lines may repeat across
+ files (e.g., install vs remove) but individual sets are managed to
+ avoid duplicates by the indexer logic.
+
+ 3.6.4 full_fmri_list and full_fmri_list.hash — membership and checksum
+
+ Purpose
+ `full_fmri_list` holds the complete set of FMRIs represented by the
+ index at a given VERSION. `full_fmri_list.hash` stores a SHA-1 of
+ the sorted `full_fmri_list` contents for quick integrity checking
+ and to support old clients.
+
+ Encoding
+ - full_fmri_list: one FMRI per line after VERSION.
+ - full_fmri_list.hash: after VERSION, exactly one line containing a
+ lowercase hexadecimal SHA-1 digest of the sorted FMRI list.
+
+ 3.6.5 manf_list.v1 — manifest id ↔ fmri mapping
+
+ Purpose
+ Provides a compact bidirectional mapping for manifest ids used in
+ `main_dict.ascii.v2` postings (pfmri_index) and other structures.
+
+ Encoding
+ After VERSION, each line corresponds to a numeric id equal to its
+ zero-based line number (excluding the VERSION line). A blank line
+ represents an empty/removed slot that can be reused later.
+
+ Examples (line numbers shown at left for clarity; not stored):
+
+ 0: library/libc@1.0-0.0.0.0.0
+ 1: driver/storage@2.3-1
+ 2: \n ← id 2 available for reuse
+
+ 3.6.6 fmri_offsets.v1 — fmri groups → delta-compressed offsets
+
+ Purpose
+ Associates groups of FMRIs with sets of manifest offsets using
+ delta compression and deduplication. Used during rebuild to avoid
+ storing duplicate offset lists per FMRI.
+
+ Encoding
+ After VERSION, each line encodes a set of space-separated FMRIs,
+ then '!', then a space-separated list of delta-encoded offsets:
+
+ ' ' ... '!' ' ' ... '\n'
+
+ The offsets after '!' are not absolute; they are deltas. To recover
+ absolute offsets, start from 0 and cumulatively add each value:
+
+ abs[0] = d0
+ abs[i] = abs[i-1] + di
+
+ Example
+
+ lib/libc@1.0-0 lib/libm@1.0-0!10 5 7
+
+ This expands to absolute offsets [10, 15, 22].
+
+ 3.6.7 Auxiliary __at_* and __st_* files (legacy per-type caches)
+
+ Purpose
+ During full rebuilds the indexer may generate per-action-type and
+ per-subtype auxiliary files named "__at_" and "__st_".
+ These are migration artifacts moved from TMP into the index root by
+ the _migrate() step for compatibility with older query paths. Their
+ exact internal format is not intended as a public contract and may
+ change; they are managed entirely by the indexer.
+
+ 3.6.8 lock — writer lock
+
+ Purpose
+ A lock file used to serialize writers. The file’s content is
+ controlled by the generic lock mechanism in pkg.lockfile. There is
+ no VERSION header; readers ignore this file.
+
+ 3.6.9 Invariants and validation
+
+ - All non-auxiliary files (except lock) MUST begin with identical
+ VERSION lines within a consistent snapshot.
+ - URL-quoting is used for tokens and any full values that may contain
+ spaces or reserved separators; conversely, readers must unquote
+ where indicated by the format (see token_byte_offset.v1 and
+ main_dict.ascii.v2).
+ - `full_fmri_list.hash` is computed as the SHA-1 of the sorted
+ `full_fmri_list` contents encoded as bytes; it is used for quick
+ integrity checks and interop with older clients.
+ - `manf_list.v1` may contain blank lines indicating reusable ids; ids
+ are the zero-based line numbers (excluding the VERSION line).
+ - `fmri_offsets.v1` stores delta-encoded offsets; readers must
+ convert to absolute offsets before use.
+ - Fast updates only modify: fast_add.v1, fast_remove.v1, and
+ full_fmri_list (+ hash). Full rebuilds rewrite all structures.