Brief context
Publication timing, weekly edition context, and source links for this brief.
Original paper
The executive brief below is grounded in the source paper and linked back to the arXiv abstract.
Large language model (LLM) agents increasingly operate over long and recurring external contexts, like document corpora and code repositories. Across invocations, existing approaches preserve either the agent's trajectory, passive access to raw material, or task-level strategies. None of them preserves what we argue is most needed for repeated same-context workloads: reusable orientation knowledge (e.g., what the context contains, how it is organized, and which entities, constants, and schemas have historically been useful) about the recurring context itself. We introduce PEEK, a system that caches and maintains this orientation knowledge as a context map: a small, constant-sized artifact in the agent's prompt that gives it a persistent peek into the external context. The map is maintained by a programmable cache policy with three modules: a Distiller that extracts transferable knowledge from inference-time signals, a Cartographer that translates it into structured edits, and a priority-based Evictor that enforces a fixed token budget. On long-context reasoning and information aggregation, PEEK improves over strong baselines by 6.3-34.0% while using 93-145 fewer iterations and incurring 1.7-5.8x lower cost than the state-of-the-art prompt-learning framework, ACE. On context learning, PEEK improves solving rate and rubric accuracy by 6.0-14.0% and 7.8-12.1%, respectively, at 1.4x lower cost than ACE. These gains generalize across LMs and agent architectures, including OpenAI Codex, a production-grade coding agent. Together, these results show that a context map helps long-context LLM agents interact with recurring external contexts more accurately and efficiently.
Executive brief
A short business-reader brief that explains why the paper matters now and what to watch or do next.
Why this is worth your attention
PEEK attacks a very practical agent cost problem: when the same AI system repeatedly works over the same repository, contract set, policy corpus, or dataset, it should not have to rediscover the map every time. The paper claims that a small, maintained “orientation cache” in the prompt can cut wasted exploration and token spend while improving answers, including against a state-of-the-art prompt-learning baseline. If this holds in real enterprise workflows, agent platforms will compete on persistent context management—not just bigger context windows or retrieval—though the evidence is still benchmark-heavy and strongest for stable, recurring contexts.
- If your agents repeatedly work over the same codebase, document set, policy library, or customer corpus, the paper’s practical message is simple: do not make them rediscover the terrain on every task. PEEK’s reported gains come from caching reusable orientation, not from a larger model.
- The assumption to revisit is that longer context, full transcript replay, or plain retrieval will solve recurring-context work. The authors show that raw history can become expensive baggage, while a small curated map can preserve the useful structure without dragging every prior interaction along.
- A serious agent platform should be able to explain how it decides what becomes persistent memory, how stale or harmful entries are removed, and how it prevents one-off answers from polluting future work. “We store conversation history” is not equivalent to the cache discipline PEEK is testing.
- PEEK matters most when many tasks share the same underlying context and there is real orientation knowledge to reuse. It is less compelling for short documents, disconnected document piles, or one-off Q&A, and the authors could not afford full runs on the latest proprietary models.
- The approach depends on agent tooling: external execution environments, chunking, sub-LLM calls, and cache-update prompts. In production, the failure modes will look operational—dropped records, stale maps, insecure sandboxes—not just model accuracy misses.
Evidence ledger
The strongest claims in the brief, along with the confidence and citation depth behind them.
PEEK stores reusable orientation knowledge about a recurring external context as a small, constant-sized map in the agent prompt.
On the evaluated long-context benchmark, PEEK reports better quality with fewer iterations and lower cost than strong baselines and ACE.
On context-learning workloads, PEEK reports higher solving rate and rubric accuracy at lower cost than ACE.
The evidence is strongest for recurring, shared-context settings and does not prove the method helps arbitrary long-document or one-off tasks.
Related briefs
More plain-English summaries from the archive with nearby topics or operator relevance.
cs.AI
Less Context, Better Agents: Efficient Context Engineering for Long-Horizon Tool-Using LLM Agents
Abhilasha Lodha et al.
cs.CR
Grounded Cache Routing for Retrieval-Augmented Generation: When Is It Safe to Reuse an Answer?
Syed Huma Shah
cs.LG
Learn from Weaknesses: Automated Domain Specialization for Small Computer-Use Agents
Suji Kim, Kangsan Kim, Sung Ju Hwang
cs.LG
OpenWebRL: Demystifying Online Multi-turn Reinforcement Learning for Visual Web Agents
Rui Yang et al.