PEEK: Context Map as an Orientation Cache for Long-Context LLM Agents explained

Brief context

Publication timing, weekly edition context, and source links for this brief.

Week

May 18, 2026

Published

May 19, 2026, 2:51 PM

Current score

Original paper

The executive brief below is grounded in the source paper and linked back to the arXiv abstract.

Large language model (LLM) agents increasingly operate over long and recurring external contexts, like document corpora and code repositories. Across invocations, existing approaches preserve either the agent's trajectory, passive access to raw material, or task-level strategies. None of them preserves what we argue is most needed for repeated same-context workloads: reusable orientation knowledge (e.g., what the context contains, how it is organized, and which entities, constants, and schemas have historically been useful) about the recurring context itself. We introduce PEEK, a system that caches and maintains this orientation knowledge as a context map: a small, constant-sized artifact in the agent's prompt that gives it a persistent peek into the external context. The map is maintained by a programmable cache policy with three modules: a Distiller that extracts transferable knowledge from inference-time signals, a Cartographer that translates it into structured edits, and a priority-based Evictor that enforces a fixed token budget. On long-context reasoning and information aggregation, PEEK improves over strong baselines by 6.3-34.0% while using 93-145 fewer iterations and incurring 1.7-5.8x lower cost than the state-of-the-art prompt-learning framework, ACE. On context learning, PEEK improves solving rate and rubric accuracy by 6.0-14.0% and 7.8-12.1%, respectively, at 1.4x lower cost than ACE. These gains generalize across LMs and agent architectures, including OpenAI Codex, a production-grade coding agent. Together, these results show that a context map helps long-context LLM agents interact with recurring external contexts more accurately and efficiently.

Open the original arXiv page

Score 76Full-paper briefagentsinferenceinfradata

Executive brief

A short business-reader brief that explains why the paper matters now and what to watch or do next.

Why this is worth your attention

PEEK attacks a very practical agent cost problem: when the same AI system repeatedly works over the same repository, contract set, policy corpus, or dataset, it should not have to rediscover the map every time. The paper claims that a small, maintained “orientation cache” in the prompt can cut wasted exploration and token spend while improving answers, including against a state-of-the-art prompt-learning baseline. If this holds in real enterprise workflows, agent platforms will compete on persistent context management—not just bigger context windows or retrieval—though the evidence is still benchmark-heavy and strongest for stable, recurring contexts.

If your agents repeatedly work over the same codebase, document set, policy library, or customer corpus, the paper’s practical message is simple: do not make them rediscover the terrain on every task. PEEK’s reported gains come from caching reusable orientation, not from a larger model.
The assumption to revisit is that longer context, full transcript replay, or plain retrieval will solve recurring-context work. The authors show that raw history can become expensive baggage, while a small curated map can preserve the useful structure without dragging every prior interaction along.
A serious agent platform should be able to explain how it decides what becomes persistent memory, how stale or harmful entries are removed, and how it prevents one-off answers from polluting future work. “We store conversation history” is not equivalent to the cache discipline PEEK is testing.
PEEK matters most when many tasks share the same underlying context and there is real orientation knowledge to reuse. It is less compelling for short documents, disconnected document piles, or one-off Q&A, and the authors could not afford full runs on the latest proprietary models.
The approach depends on agent tooling: external execution environments, chunking, sub-LLM calls, and cache-update prompts. In production, the failure modes will look operational—dropped records, stale maps, insecure sandboxes—not just model accuracy misses.

Evidence ledger

The strongest claims in the brief, along with the confidence and citation depth behind them.

capabilityhighp.1p.3

PEEK stores reusable orientation knowledge about a recurring external context as a small, constant-sized map in the agent prompt.

inferencehighp.1

On the evaluated long-context benchmark, PEEK reports better quality with fewer iterations and lower cost than strong baselines and ACE.

capabilityhighp.1

On context-learning workloads, PEEK reports higher solving rate and rubric accuracy at lower cost than ACE.

caveathighp.9p.19

The evidence is strongest for recurring, shared-context settings and does not prove the method helps arbitrary long-document or one-off tasks.

Related briefs

More plain-English summaries from the archive with nearby topics or operator relevance.

cs.SE

TestEvo-Bench: An Executable and Live Benchmark for Test and Code Co-Evolution

Jiale Amber Wang, Kaiyuan Wang, Pengyu Nie

Read brief arXiv

cs.AI

Learning Safe Agent Behaviour from Human Preferences and Justifications via World Models

Ilias Kazantzidis et al.

Read brief arXiv

cs.CV

Harrison.Rad 1.5 Technical Report: A radiology foundation model that can draft reports from images, priors and clinical context

Suneeta Mall et al.

Read brief arXiv

cs.CL

A Reliability Assessment of LALM Audio Judges for Full-Duplex Voice Agents

A. Sayyad et al.

Read brief arXiv