Abstracted
A weekly digest of the most commercially relevant arXiv papers for operators, PMs, investors, and non-research engineers.
Weekly Brief
Archive
Feedback
Home
/
Sitemap
Library sitemap
All weeks and briefs
Crawlable links to every public weekly edition page and every individual brief page.
Week of Jun 8, 2026
Reward Modeling for Multi-Agent Orchestration
MiniMax Sparse Attention
The Illusion of Multi-Agent Advantage
FlowBank: Query-Adaptive Agentic Workflows Optimization through Precompute-and-Reuse
A History-Aware Visually Grounded Critic for Computer Use Agents
Event-Driven Reinforcement Learning Enables Long-Horizon Control in Semiconductor Fabrication
Less Context, Better Agents: Efficient Context Engineering for Long-Horizon Tool-Using LLM Agents
FASE: Fast Adaptive Semantic Entropy for Code Quality
AliyunConsoleAgent: Training Web Agents in Real-World Cloud Environments via Distillation and Reinforcement Learning
What Should a Skill Remember? Quality--Cost Trade-offs in Cost-Aware Skill Rewriting for Language Model Agents
Week of Jun 1, 2026
Learning to Route LLMs from Implicit Cost-Performance Preferences via Meta-Learning
GuardNet: Ensemble Strategies of Shallow Neural Networks for Robust Prompt Injection and Jailbreak Detection
When Evidence is Sparse: Weakly Supervised Early Failure Alerting in Dialogs and LLM-Agent Trajectories
Statistically Reliable LLM-Based Ranking Evaluation via Prediction-Powered Inference
Cascading Hallucination in Agentic RAG: The CHARM Framework for Detection and Mitigation
Can Generalist Agents Automate Data Curation?
KForge: LLM-Driven Cross-Platform Kernel Generation for AI Accelerators
Cosmos 3: Omnimodal World Models for Physical AI
OpenWebRL: Demystifying Online Multi-turn Reinforcement Learning for Visual Web Agents
Compliance-Scored Best-of-N Guardrail Orchestration for Multimodal Document Generation in Payments Dispute Defense
Week of May 25, 2026
AgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and Security
CONCAT: Consensus- and Confidence-Driven Ad Hoc Teaming for Efficient LLM-Based Multi-Agent Systems
Training Deliberative Monitors for Black-Box Scheming Detection
Robust and Efficient Guardrails with Latent Reasoning
Learn from Weaknesses: Automated Domain Specialization for Small Computer-Use Agents
Pruning and Distilling Mixture-of-Experts into Dense Language Models
Grounded Cache Routing for Retrieval-Augmented Generation: When Is It Safe to Reuse an Answer?
Benchmarks are Not Enough: RAMP for Runtime Assessing of Agentic Models in Production Systems
The Coverage Illusion: From Pre-retrieval Routing Failure to Post-retrieval Cascades in a Production RAG System
LongCat-Video-Avatar 1.5 Technical Report
Week of May 18, 2026
The Distillation Game: Adaptive Attacks & Efficient Defenses
SynAE: A Framework for Measuring the Quality of Synthetic Data for Tool-Calling Agent Evaluations
DeferMem: Query-Time Evidence Distillation via Reinforcement Learning for Long-Term Memory QA
Echo: Learning from Experience Data via User-Driven Refinement
Frontier: Towards Comprehensive and Accurate LLM Inference Simulation
GraphRAG on Consumer Hardware: Benchmarking Local LLMs for Healthcare EHR Schema Retrieval
Heartbeat-Bound Hierarchical Credentials: Cryptographic Revocation for AI Agent Swarms
PEEK: Context Map as an Orientation Cache for Long-Context LLM Agents
DecisionBench: A Benchmark for Emergent Delegation in Long-Horizon Agentic Workflows
TaskGround: Structured Executable Task Inference for Full-Scene Household Reasoning
Week of May 11, 2026
AutoLLMResearch: Training Research Agents for Automating LLM Experiment Configuration -- Learning from Cheap, Optimizing Expensive
VERDI: Single-Call Confidence Estimation for Verification-Based LLM Judges via Decomposed Inference
LatentRouter: Can We Choose the Right Multimodal Model Before Seeing Its Answer?
Rethinking LLMOps for Fraud and AML: Building a Compliance-Grade LLM Serving Stack
From Controlled to the Wild: Evaluation of Pentesting Agents for the Real-World
MCPShield: Content-Aware Attack Detection for LLM Agent Tool-Call Traffic
EnergyLens: Interpretable Closed-Form Energy Models for Multimodal LLM Inference Serving
Route Before Retrieve: Activating Latent Routing Abilities of LLMs for RAG vs. Long-Context Selection
PruneTIR: Inference-Time Tool Call Pruning for Effective yet Efficient Tool-Integrated Reasoning
Nautilus Compass: Black-box Persona Drift Detection for Production LLM Agents
Week of May 4, 2026
UniSD: Towards a Unified Self-Distillation Framework for Large Language Models
Cross-Modal Navigation with Multi-Agent Reinforcement Learning
Measuring Evaluation-Context Divergence in Open-Weight LLMs: A Paired-Prompt Protocol with Pilot Evidence of Alignment-Pipeline-Specific Heterogeneity
VibeServe: Can AI Agents Build Bespoke LLM Serving Systems?
FinRAG-12B: A Production-Validated Recipe for Grounded Question Answering in Banking
FINER-SQL: Boosting Small Language Models for Text-to-SQL
GeoDecider: A Coarse-to-Fine Agentic Workflow for Explainable Lithology Classification
LLM-ADAM: A Generalizable LLM Agent Framework for Pre-Print Anomaly Detection in Additive Manufacturing
When Correct Isn't Usable: Improving Structured Output Reliability in Small Language Models
Planner Matters! An Efficient and Unbalanced Multi-agent Collaboration Framework for Long-horizon Planning
Week of Apr 27, 2026
Synthetic Computers at Scale for Long-Horizon Productivity Simulation
ObjectGraph: From Document Injection to Knowledge Traversal -- A Native File Format for the Agentic Era
When Your LLM Reaches End-of-Life: A Framework for Confident Model Migration in Production Systems
Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language Models
Bian Que: An Agentic Framework with Flexible Skill Arrangement for Online System Operations
When to Retrieve During Reasoning: Adaptive Retrieval for Large Reasoning Models
From Soliloquy to Agora: Memory-Enhanced LLM Agents with Decentralized Debate for Optimization Modeling
PolyKV: A Shared Asymmetrically-Compressed KV Cache Pool for Multi-Agent LLM Inference
LLM-Guided Agentic Floor Plan Parsing for Accessible Indoor Navigation of Blind and Low-Vision People
Week of Apr 20, 2026
Learning to Communicate: Toward End-to-End Optimization of Multi-Agent Language Systems
CHASM: Unveiling Covert Advertisements on Chinese Social Media
Scalable AI Inference: Performance Analysis and Optimization of AI Model Serving
Bimanual Robot Manipulation via Multi-Agent In-Context Learning
DR-Venus: Towards Frontier Edge-Scale Deep Research Agents with Only 10K Open Data
ShadowPEFT: Shadow Network for Parameter-Efficient Fine-Tuning
Harmful Intent as a Geometrically Recoverable Feature of LLM Residual Streams
A multimodal and temporal foundation model for virtual patient representations at healthcare system scale
Latent Phase-Shift Rollback: Inference-Time Error Correction via Residual Stream Monitoring and KV-Cache Steering
Week of Apr 13, 2026
AIPC: Agent-Based Automation for AI Model Deployment with Qualcomm AI Runtime
AgentGA: Evolving Code Solutions in Agent-Seed Space
Don't Retrieve, Navigate: Distilling Enterprise Knowledge into Navigable Agent Skills for QA and RAG
From Anchors to Supervision: Memory-Graph Guided Corpus-Free Unlearning for Large Language Models
Bridging MARL to SARL: An Order-Independent Multi-Agent Transformer via Latent Consensus
Nemotron 3 Super: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning
Policy-Invisible Violations in LLM-Based Agents
AutoSurrogate: An LLM-Driven Multi-Agent Framework for Autonomous Construction of Deep Learning Surrogate Models in Subsurface Flow
ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents
The Salami Slicing Threat: Exploiting Cumulative Risks in LLM Systems
Week of Apr 6, 2026
KnowU-Bench: Towards Interactive, Proactive, and Personalized Mobile Agent Evaluation
KV Cache Offloading for Context-Intensive Tasks
SkillClaw: Let Skills Evolve Collectively with Agentic Evolver
Don't Overthink It: Inter-Rollout Action Agreement as a Free Adaptive-Compute Signal for LLM Agents
LegoDiffusion: Micro-Serving Text-to-Image Diffusion Workflows
Small Vision-Language Models are Smart Compressors for Long Video Understanding
Dynamic Attentional Context Scoping: Agent-Triggered Focus Sessions for Isolated Per-Agent Steering in Multi-Agent LLM Orchestration
More Capable, Less Cooperative? When LLMs Fail At Zero-Cost Collaboration
DIVERSED: Relaxed Speculative Decoding via Dynamic Ensemble Verification
GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents
Gym-Anything: Turn any Software into an Agent Environment
AgentOpt v0.1 Technical Report: Client-Side Optimization for LLM-Based Agent
LatentAudit: Real-Time White-Box Faithfulness Monitoring for Retrieval-Augmented Generation with Verifiable Deployment
SkillX: Automatically Constructing Skill Knowledge Bases for Agents
Week of Mar 30, 2026
Learning to Learn-at-Test-Time: Language Agents with Learnable Adaptation Policies
MOON3.0: Reasoning-aware Multimodal Representation Learning for E-commerce Product Understanding
Learning to Play Blackjack: A Curriculum Learning Perspective
Mimosa Framework: Toward Evolving Multi-Agent Systems for Scientific Research
CirrusBench: Evaluating LLM-based Agents Beyond Correctness in Real-World Cloud Service Environments
Courtroom-Style Multi-Agent Debate with Progressive RAG and Role-Switching for Controversial Claim Verification
Marco DeepResearch: Unlocking Efficient Deep Research Agents via Verification-Centric Design
SkinGPT-X: A Self-Evolving Collaborative Multi-Agent System for Transparent and Trustworthy Dermatological Diagnosis
Doctorina MedBench: End-to-End Evaluation of Agent-Based Medical AI
Week of Mar 23, 2026
AD-CARE: A Guideline-grounded, Modality-agnostic LLM Agent for Real-world Alzheimer's Disease Diagnosis with Multi-cohort Assessment, Fairness Analysis, and Reader Study
WebTestBench: Evaluating Computer-Use Agents towards End-to-End Automated Web Testing
UI-Voyager: A Self-Evolving GUI Agent Learning via Failed Experience
The Price Reversal Phenomenon: When Cheaper Reasoning Models End Up Costing More
Self-Distillation for Multi-Token Prediction
VTAM: Video-Tactile-Action Models for Complex Physical Interaction Beyond VLAs
MsFormer: Enabling Robust Predictive Maintenance Services for Industrial Devices
SecureBreak -- A dataset towards safe and secure models
AI Token Futures Market: Commoditization of Compute and Derivatives Contract Design
Efficient Zero-Shot AI-Generated Image Detection
PRISM: Breaking the O(n) Memory Wall in Long-Context LLM Inference via O(1) Photonic Block Selection
Week of Mar 16, 2026
Memento-Skills: Let Agents Design Agents
Governed Memory: A Production Architecture for Multi-Agent Workflows
Lightweight Adaptation for LLM-based Technical Service Agent: Latent Logic Augmentation and Robust Noise Reduction
MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild
Is Conformal Factuality for RAG-based LLMs Robust? Novel Metrics and Systematic Insights
Evaluating Agentic Optimization on Large Codebases
MAC: Multi-Agent Constitution Learning
CUBE: A Standard for Unifying Agent Benchmarks
The PokeAgent Challenge: Competitive and Long-Context Learning at Scale
Intelligent Co-Design: An Interactive LLM Framework for Interior Spatial Design via Multi-Modal Agents
AgentTrace: Causal Graph Tracing for Root Cause Analysis in Deployed Multi-Agent Systems
Week of Mar 9, 2026
Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections
Automatic Generation of High-Performance RL Environments
XSkill: Continual Learning from Experience and Skills in Multimodal Agents
Slow-Fast Inference: Training-Free Inference Acceleration via Within-Sentence Support Stability
Think While Watching: Online Streaming Segment-Level Memory for Multi-Turn Video Reasoning in Multimodal Large Language Models
CreativeBench: Benchmarking and Enhancing Machine Creativity via Self-Evolving Challenges
When OpenClaw Meets Hospital: Toward an Agentic Operating System for Dynamic Clinical Workflows
OSCBench: Benchmarking Object State Change in Text-to-Video Generation
RoboClaw: An Agentic Framework for Scalable Long-Horizon Robotic Tasks
One Supervisor, Many Modalities: Adaptive Tool Orchestration for Autonomous Queries
COMIC: Agentic Sketch Comedy Generation
LookaheadKV: Fast and Accurate KV Cache Eviction by Glimpsing into the Future without Generation
Nurture-First Agent Development: Building Domain-Expert AI Agents Through Conversational Knowledge Crystallization
Does LLM Alignment Really Need Diversity? An Empirical Study of Adapting RLVR Methods for Moral Reasoning
Resource-constrained Amazons chess decision framework integrating large language models and graph attention
OpenClaw-RL: Train Any Agent Simply by Talking
Context Engineering: From Prompts to Corporate Multi-Agent Architecture
Latent World Models for Automated Driving: A Unified Taxonomy, Evaluation Framework, and Open Challenges
From Days to Minutes: An Autonomous AI Agent Achieves Reliable Clinical Triage in Remote Patient Monitoring
Meissa: Multi-modal Medical Agentic Intelligence
Tool Receipts, Not Zero-Knowledge Proofs: Practical Hallucination Detection for AI Agents
PostTrainBench: Can LLM Agents Automate LLM Post-Training?
SplitAgent: A Privacy-Preserving Distributed Architecture for Enterprise-Cloud Agent Collaboration
Ares: Adaptive Reasoning Effort Selection for Efficient LLM Agents
Week of Mar 2, 2026
HLER: Human-in-the-Loop Economic Research via Multi-Agent Pipelines for Empirical Discovery
SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via Continuous Integration
Light
Dark