arXiv 2604.27820v1Apr 30, 2026

ObjectGraph: From Document Injection to Knowledge Traversal -- A Native File Format for the Agentic Era

Mohit Dubey, Open Gigantic

Brief context

Publication timing, weekly edition context, and source links for this brief.

Published

Apr 30, 2026, 1:03 PM

Current score

84

Original paper

The executive brief below is grounded in the source paper and linked back to the arXiv abstract.

Every document format in existence was designed for a human reader moving linearly through text. Autonomous LLM agents do not read - they retrieve. This fundamental mismatch forces agents to inject entire documents into their context window, wasting tokens on irrelevant content, compounding state across multi-turn loops, and broadcasting information indiscriminately across agent roles. We argue this is not a prompt engineering problem, not a retrieval problem, and not a compression problem: it is a format problem. We introduce OBJECTGRAPH (.og), a file format that reconceives the document as a typed, directed knowledge graph to be traversed rather than a string to be injected. OBJECTGRAPH is a strict superset of Markdown - every .md file is a valid .og file - requires no infrastructure beyond a two-primitive query protocol, and is readable by both humans and agents without tooling. We formalize the Document Consumption Problem, characterise six structural properties no existing format satisfies simultaneously, and prove OBJECTGRAPH satisfies all six. We further introduce the Progressive Disclosure Model, the Role-Scoped Access Protocol, and Executable Assertion Nodes as native format primitives. Empirical evaluation across five document classes and eight agent task types demonstrates up to 95.3 percent token reduction with no statistically significant degradation in task accuracy (p > 0.05). Transpiler fidelity reaches 98.7 percent content preservation on a held-out document benchmark.

Score 84Full-paper briefagentsinferenceinfradata

Executive brief

A short business-reader brief that explains why the paper matters now and what to watch or do next.

Why this is worth your attention

Agent costs are increasingly driven less by model calls than by dumping entire files into context so agents can find a few relevant paragraphs. ObjectGraph’s claim is that the fix belongs in the document format itself: make files queryable, scoped, and dependency-aware so agents traverse only what they need. The reported results are large—mean token use down 92%, a five-turn workflow using 36.5× fewer tokens, and no accuracy penalty in its benchmark—which would matter for runbooks, policies, product docs, and any agent workflow living on corporate knowledge. The catch is adoption: this is a proposed format with bounded benchmark coverage, no current cross-file federation, and untested adversarial robustness, not yet an enterprise standard.

  • The paper’s most useful provocation is that agent inefficiency may be baked into today’s document formats, not only retrieval systems. If documents carry indexes, role scopes, dependencies, and validation rules natively, some cost control and governance shifts from the application layer into the content layer.
  • A practical buying question is whether an agent platform can retrieve document fragments by node, role, and dependency—not merely run semantic search over chunks. That matters for runbooks, policies, contracts, and internal knowledge bases where agents need the right paragraph plus the prerequisite context, without exposing everything else.
  • The reported efficiency gains are large enough to matter economically: 92% lower mean token use and a 36.5× reduction in a five-turn workflow, while benchmark accuracy did not fall. If this survives real deployment, agent workflows that repeatedly consult the same operational documents become cheaper and less brittle.
  • The Markdown-compatible design and 98.7% reported transpiler fidelity make migration plausible, but “after human review” is the operational tell. Adoption becomes more credible if teams can convert existing docs through CI, preserve meaning, and keep ObjectGraph metadata current without creating a second documentation burden.
  • The evidence is promising but bounded: the benchmark covers 240 documents, the authors note current limits around cross-file edge resolution, and adversarial document authors are not evaluated. For enterprises, the unresolved questions are interoperability, governance, and whether multiple vendors converge on the same document semantics.

Affiliations

Institution names extracted from the brief's PDF summary call.

Open Gigantic

Author marker Mohit Dubey

From PDF summary

Evidence ledger

The strongest claims in the brief, along with the confidence and citation depth behind them.

capabilityhighp.1p.3

ObjectGraph proposes a file format that turns documents into typed, directed knowledge graphs for agent traversal rather than full-text context injection.

inferencehighp.8p.8

The paper reports substantial token savings in both single-query and multi-turn agent workflows.

capabilitymediump.8p.8

The reported benchmark shows accuracy matching or exceeding Markdown on most task types, with higher mean accuracy in the table.

stackmediump.8

The authors report high Markdown-to-ObjectGraph conversion fidelity, but not perfect automatic migration.

caveatmediump.9

The paper’s own limitations include benchmark scope, missing multi-file federation, standardisation risk, and no adversarial robustness evaluation.

Related briefs

More plain-English summaries from the archive with nearby topics or operator relevance.

cs.LG

AutoSurrogate: An LLM-Driven Multi-Agent Framework for Autonomous Construction of Deep Learning Surrogate Models in Subsurface Flow

Jiale Liu, Nanzhe Wang

cs.LG

Bridging MARL to SARL: An Order-Independent Multi-Agent Transformer via Latent Consensus

Zijian Zhao, Jing Gao, Sen Li

cs.AI

AgentGA: Evolving Code Solutions in Agent-Seed Space

David Y. Y. Tan, Kellie Chin, Jingxian Zhang

cs.IR

Don't Retrieve, Navigate: Distilling Enterprise Knowledge into Navigable Agent Skills for QA and RAG

Yiqun Sun, Pengfei Wei, Lawrence B. Hsieh

Thank you to arXiv for use of its open access interoperability. This product was not reviewed or approved by, nor does it necessarily express or reflect the policies or opinions of, arXiv.
LightDark