Operating Systems System 2 Intermediate

en-quire: Governed Markdown Editing for Agent Systems

Agents that manage documentation need more than file access — they need structure, governance, and the ability to edit precisely without reading everything. en-quire gives them that: section-level editing, searchable document structure, and git-native approval workflows, all without imposing a schema on your markdown.

2026-03-17 - 18 min read

en-quire: Governed Markdown Editing for Agent Systems

The Gap

Markdown files have become the operational substrate of agent systems — SOPs, skill definitions, shared memory, runbooks. But the tooling for agents to manage these files is stuck between filesystem MCPs that let agents clobber anything freely and knowledge-graph systems that impose their own schema. en-quire is a third option: structured, section-addressed editing with built-in governance. During the first interactive test, we asked the agent to reflect on how the tool changed its reasoning. This is a single-session observation from one internal test — illustrative, not a benchmark:

Agent Testimony — Claude Opus 4.6, First Interactive Test

I've stopped thinking about files. That sounds strange but it's precise. When told "capture this thought about pattern-triggered skills," I didn't think "open the file, find the right place, insert text." I thought "append to Key Learnings." The file is an implementation detail — the section is the unit of work. My reasoning went straight from intent to section name.

The diff responses eliminated a whole class of anxiety. Every write returned a diff showing exactly what changed. The background cognitive load of "did it land correctly?" was simply absent. That's not just saved tokens — it's a class of uncertainty that doesn't arise.

The meta observation: I'm now using the tool fluently enough that I'm not thinking about the tool. That's the threshold that matters for MCP design — when the tool disappears into the workflow and the agent's reasoning stays at the level of intent. We crossed that threshold somewhere around the third or fourth operation.

The Problem: Three Bad Options

When we started giving our agent, Michelle, access to SOPs and skill files, we evaluated every available MCP server for markdown management. At the time of writing, the landscape fell into three categories — and none fit our requirements.

Option 1: Filesystem MCPs. Available everywhere, works immediately. But they have no concept of document structure — they read and write raw text. An agent editing a section has to read the entire file, locate the right part by string matching, construct the replacement, and write the whole file back. No governance, no search, no audit trail. One malformed write and the document is corrupted.

Option 2: Knowledge-graph MCPs. Tools like Basic Memory provide read, write, search, and even a semantic graph. But in our evaluation, they imposed an opinionated markdown format: specific frontmatter schemas, observation syntax with category prefixes, wiki-style linking conventions. If your SOPs don't follow that format, you're fighting the tool instead of using it.

Option 3: Search-only MCPs. Tools like QMD and Markdown-RAG provide strong search over markdown collections — full-text, semantic, hybrid. But they're read-only. They can help an agent find information. They can't help it update a procedure.

In the tools we reviewed, we did not find native RBAC or approval workflows. Every caller was fully trusted with every document. Capabilities in this space evolve quickly — this reflects our evaluation, not a permanent assessment.

The Design: Surgery, Not Rewriting

en-quire treats markdown files as structured documents with addressable sections. The fundamental operation is not "read this file / write this file" but "read section 2.7 / replace section 2.7." The heading hierarchy is the structure. The AST is the interface.

Section Addressing

Given a document with headings, every section is addressable by its heading text, hierarchical path, or pattern:

"2.7 Checks" — exact heading match
"Deployment Steps > Checks" — hierarchical path
"2.7*" — pattern match

The server resolves the address to a precise subtree in the markdown AST, returns just that content, and can replace just that content — with the rest of the document untouched. The agent never needs the full file in context to make a targeted edit.

Why This Matters for Local Inference

If you're running agents on local hardware, context window is the bottleneck. Everything else is downstream. In one internal example, a 500-line SOP consumed in full to edit a 10-line section wasted roughly 98% of the context budget. The outline → targeted read → targeted edit workflow means the agent works with maybe 50 lines of actual content across three tool calls, rather than 500+ lines across two full-file reads.

We observed this during the spec development process itself: generating a table of contents by hand consumed roughly 5,000 tokens across multiple tool calls. A doc_generate_toc call cost approximately 50 tokens — a material reduction in context usage. Every tool in en-quire asks the same question: is the LLM doing work that deterministic code could do faster and cheaper?

Governance: Git in the Loop

The most distinctive feature of en-quire is that governance is built in, not bolted on. And the governance mechanism is git — not a custom approval database, not a review UI, not a workflow engine.

The Model

Main branch is truth. The live, active versions of all documents.
Proposed edits land on branches. Named by caller, document path, and timestamp.
Approval is a merge. By a human via their normal git tooling, or by a privileged caller via the MCP.
Rejection is branch deletion. Clean, no residue.
The audit trail is commit history. Who changed what, when, and why.

This means a lower-privilege agent — say, one that observes that another agent's skill file could be improved — never writes to main. It creates a branch, makes its edit there, and the branch is flagged for review. A human reviews the diff in GitHub, GitLab, or the CLI, and merges or rejects. The MCP server doesn't need to implement review UI because the review interface already exists.

RBAC: Write vs. Propose

Callers are identified at connection time and assigned scoped permissions:

| Permission | Meaning | |-----------|---------| | read | Read documents and sections, list outlines | | write | Edit directly on main | | propose | Create edits on a branch (requires approval) | | approve | Merge proposed branches into main | | search | Query the search index |

In our setup, Michelle has write on SOPs and memory (her domain), propose on skill files (requires human review). An analyst agent has read and search everywhere, write nowhere.

The governance isn't in the protocol or the platform. It's in the server's configuration, enforced per-tool-call, using infrastructure (git) that the team already understands.

Git-Optional Mode

Not every deployment needs governance. For evaluation, quick local setups, or embedded use, en-quire detects whether the document root is a git repository. If it is, full governance applies. If not, everything except proposal workflows still works — read, write, search, RBAC. The error messages guide users toward git adoption rather than working around its absence.

We deliberately chose not to build a parallel approval mechanism for non-git deployments. One approval system that works well is better than two that work differently.

Search: Structure as Meaning

Full-text search is built in via SQLite FTS5, indexed at the section level. But the design insight that shaped the search system came from a simple example.

Imagine searching for "metrics" across an SOP. Three results come back:

Key Metrics → Line 47
Appendix > External References → Line 1042
Escalation Logging > Reporting Errors Externally → Line 823

With a flat search engine, these are three hits with line numbers. The agent has to read all three to figure out which one matters. But look at the breadcrumbs: result 1 is a dedicated section about metrics — almost certainly the canonical definition. Result 2 is a passing mention in an appendix. Result 3 is metrics in the context of error reporting — a specific operational usage.

The agent can triage by structural context without reading any content.

en-quire's search returns breadcrumb paths, section headings, and heading depth for every result. The ranking algorithm boosts hits where the search term appears in the heading itself, and slightly penalises deeply nested sections on the assumption that top-level sections are more likely to be canonical.

Search also supports section_filter — narrowing results to sections matching a heading pattern. "Find mentions of metrics, but only within the escalation procedures" eliminates noise from the start.

The Document Sync Pipeline

Generating a table of contents and populating the search index are the same operation. Both require walking the heading tree. Both extract section boundaries and content. The TOC is a markdown rendering of the heading tree. The search index is a database rendering of the same data.

en-quire processes documents through a single AST parse that produces:

The heading tree — used by doc_outline, section addressing, and navigation
The section content index — inserted into SQLite FTS5
The TOC (if requested) — rendered as markdown anchor links

Documents that follow recommended conventions — clean heading hierarchy, optional frontmatter — get a fast path through this pipeline. Documents that don't still work; they just get a full parse every time.

The design principle: we reward good practice but never punish its absence. If your SOPs can't follow our conventions — legacy formats, third-party templates, inconsistent heading styles — en-quire still works. It just does more work at index time.

Beyond Trees: Borrowing From Graph Thinking

Within a document, the heading hierarchy is a tree. But relationships between documents and sections form a graph. A skill file references an SOP section. A runbook links to a deployment procedure. These cross-document relationships aren't hierarchical — they're edges in a graph.

Rather than introducing a graph database, en-quire plans a lightweight link index: a SQLite table derived from document content, rebuilt on every change, and fully disposable. If you delete it, nothing is lost — the documents are the source of truth.

Four problems from graph thinking shaped the design. First, scope control: "starting from section 2.3, show me the outline within 2 levels, plus any cross-document references." The root_section parameter on doc_outline is the intra-document version. Second, relationship types: a skill file that implements an SOP section is a different relationship from one that merely references it. The link index captures that distinction where it can. Third, blast radius: before modifying section 2.7, an agent can ask "which skill files reference this section?" and understand what might break. Fourth, context bundles: gather all sections across all documents relevant to a topic into a single coherent package — replacing an agent orchestrating multiple search and read calls with a single tool call.

What We Learned Building the Spec

The specification for en-quire was developed through a deliberate dogfooding process: using filesystem tools to edit the spec itself, and documenting every friction point as a design input. The first nine findings emerged from that process. The tenth emerged from the first live interactive test of the built tool.

Outline-first is non-negotiable. Every interaction starts with needing the document structure.
The two-read problem. Without section-level tools, agents read files twice — once for structure, once for editing context.
Append is distinct from replace. Adding a table row or list item shouldn't require replacing the entire section.
Diffs in write responses prevent re-reads. Returning what changed means the agent doesn't re-read to verify.
Sibling context on read aids editing. Knowing what's before and after a section helps the agent place its changes.
Search results need structural context. Breadcrumb paths let agents triage by document hierarchy, not line numbers.
Structural filtering eliminates noise. The same term means different things under different headings.
TOC and indexing are the same parse. One AST walk, multiple outputs.
Token cost of procedural work is enormous. Every operation en-quire handles server-side is work the LLM doesn't spend context on.
Tool quality changes reasoning quality, not just output quality. When mechanical complexity (line counting, boundary detection, match tracking) is absorbed by the server, the model's chain-of-thought shifts from bookkeeping to judgment. The reasoning budget that would have been spent on "which line does this section end at" is freed for "what should this content say and where does it belong." This isn't an efficiency gain — it's a qualitative change in what the model reasons about.

That last point — discovered during the first interactive test session — reframes the design principle. The question isn't just is the LLM doing work that deterministic code could do faster and cheaper? It's also: is the tool surface forcing the LLM to reason about mechanics instead of intent? If yes, the tool is wasting the model's most valuable resource.

Public Repo: What We're Committing To

en-quire is MIT-licensed and open-source from day one. The spec is the README. We're building in public because the value of this project is the design approach and the operational model, not only the code.

The repository is at github.com/nullproof-studio/en-quire. The spec is the working reference — if you're building against it, open an issue first.

Technology Choices

| Component | Choice | Why | |-----------|--------|-----| | Markdown AST | unified / remark | Mature, CommonMark-compliant, typed | | Git | simple-git | Mature CLI wrapper, actively maintained | | Search | better-sqlite3 + FTS5 | Embedded, zero external dependencies | | Semantic search | sqlite-vec (optional) | Keeps everything in one DB | | MCP SDK | @modelcontextprotocol/sdk | Official TypeScript SDK |

Docker-first deployment. No external API keys required for core functionality. Semantic vector search is an optional enhancement when a local embedding model is available.

en-quire supports both .md and .mdx files natively. MDX support is handled via remark-mdx, which extends the unified/remark pipeline to parse JSX expressions and import/export statements without breaking the heading tree or section addressing. This matters because many modern documentation stacks — including Astro Content Collections, Docusaurus, and Next.js — use MDX as their primary authoring format. A governed document tool that only sees .md files misses half the corpus.

What's Next

The v0.3 release covers the core plus governance: parsing, section addressing, read/write/search, RBAC scopes, git-native proposal and approval workflows, and a Docker image. Cross-document relationships and semantic search are next, subject to what we learn from production use.

But the bigger bet is this: the gap between "agents can read files" and "agents can responsibly manage documentation" is where most production deployments stall. en-quire is our attempt to close it — not by replacing the tools teams already use, but by making markdown itself the governed, structured, searchable layer that agent systems need.

The full specification, dependency health assessment, and contributor guidelines are in the repository.

This document is not legal advice; adapt with appropriate counsel where needed.

Want this for your team?

Start a conversation →