Abstracted
A weekly digest of the most commercially relevant arXiv papers for operators, PMs, investors, and non-research engineers.
Weekly Brief
Archive
Feedback
Home
/
Sitemap
Library sitemap
All weeks and briefs
Crawlable links to every public weekly edition page and every individual brief page.
Week of Jun 1, 2026
Learning to Route LLMs from Implicit Cost-Performance Preferences via Meta-Learning
GuardNet: Ensemble Strategies of Shallow Neural Networks for Robust Prompt Injection and Jailbreak Detection
When Evidence is Sparse: Weakly Supervised Early Failure Alerting in Dialogs and LLM-Agent Trajectories
Statistically Reliable LLM-Based Ranking Evaluation via Prediction-Powered Inference
Cascading Hallucination in Agentic RAG: The CHARM Framework for Detection and Mitigation
Can Generalist Agents Automate Data Curation?
KForge: LLM-Driven Cross-Platform Kernel Generation for AI Accelerators
Cosmos 3: Omnimodal World Models for Physical AI
OpenWebRL: Demystifying Online Multi-turn Reinforcement Learning for Visual Web Agents
Compliance-Scored Best-of-N Guardrail Orchestration for Multimodal Document Generation in Payments Dispute Defense
Week of May 25, 2026
AgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and Security
CONCAT: Consensus- and Confidence-Driven Ad Hoc Teaming for Efficient LLM-Based Multi-Agent Systems
Training Deliberative Monitors for Black-Box Scheming Detection
Robust and Efficient Guardrails with Latent Reasoning
Learn from Weaknesses: Automated Domain Specialization for Small Computer-Use Agents
Pruning and Distilling Mixture-of-Experts into Dense Language Models
Grounded Cache Routing for Retrieval-Augmented Generation: When Is It Safe to Reuse an Answer?
Benchmarks are Not Enough: RAMP for Runtime Assessing of Agentic Models in Production Systems
The Coverage Illusion: From Pre-retrieval Routing Failure to Post-retrieval Cascades in a Production RAG System
LongCat-Video-Avatar 1.5 Technical Report
Week of May 18, 2026
The Distillation Game: Adaptive Attacks & Efficient Defenses
SynAE: A Framework for Measuring the Quality of Synthetic Data for Tool-Calling Agent Evaluations
DeferMem: Query-Time Evidence Distillation via Reinforcement Learning for Long-Term Memory QA
Echo: Learning from Experience Data via User-Driven Refinement
Frontier: Towards Comprehensive and Accurate LLM Inference Simulation
GraphRAG on Consumer Hardware: Benchmarking Local LLMs for Healthcare EHR Schema Retrieval
Heartbeat-Bound Hierarchical Credentials: Cryptographic Revocation for AI Agent Swarms
PEEK: Context Map as an Orientation Cache for Long-Context LLM Agents
DecisionBench: A Benchmark for Emergent Delegation in Long-Horizon Agentic Workflows
TaskGround: Structured Executable Task Inference for Full-Scene Household Reasoning
Week of May 11, 2026
AutoLLMResearch: Training Research Agents for Automating LLM Experiment Configuration -- Learning from Cheap, Optimizing Expensive
VERDI: Single-Call Confidence Estimation for Verification-Based LLM Judges via Decomposed Inference
LatentRouter: Can We Choose the Right Multimodal Model Before Seeing Its Answer?
Rethinking LLMOps for Fraud and AML: Building a Compliance-Grade LLM Serving Stack
From Controlled to the Wild: Evaluation of Pentesting Agents for the Real-World
MCPShield: Content-Aware Attack Detection for LLM Agent Tool-Call Traffic
EnergyLens: Interpretable Closed-Form Energy Models for Multimodal LLM Inference Serving
Route Before Retrieve: Activating Latent Routing Abilities of LLMs for RAG vs. Long-Context Selection
PruneTIR: Inference-Time Tool Call Pruning for Effective yet Efficient Tool-Integrated Reasoning
Nautilus Compass: Black-box Persona Drift Detection for Production LLM Agents
Week of May 4, 2026
UniSD: Towards a Unified Self-Distillation Framework for Large Language Models
Cross-Modal Navigation with Multi-Agent Reinforcement Learning
Measuring Evaluation-Context Divergence in Open-Weight LLMs: A Paired-Prompt Protocol with Pilot Evidence of Alignment-Pipeline-Specific Heterogeneity
VibeServe: Can AI Agents Build Bespoke LLM Serving Systems?
FinRAG-12B: A Production-Validated Recipe for Grounded Question Answering in Banking
FINER-SQL: Boosting Small Language Models for Text-to-SQL
GeoDecider: A Coarse-to-Fine Agentic Workflow for Explainable Lithology Classification
LLM-ADAM: A Generalizable LLM Agent Framework for Pre-Print Anomaly Detection in Additive Manufacturing
When Correct Isn't Usable: Improving Structured Output Reliability in Small Language Models
Planner Matters! An Efficient and Unbalanced Multi-agent Collaboration Framework for Long-horizon Planning
Week of Apr 27, 2026
Synthetic Computers at Scale for Long-Horizon Productivity Simulation
ObjectGraph: From Document Injection to Knowledge Traversal -- A Native File Format for the Agentic Era
When Your LLM Reaches End-of-Life: A Framework for Confident Model Migration in Production Systems
Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language Models
Bian Que: An Agentic Framework with Flexible Skill Arrangement for Online System Operations
When to Retrieve During Reasoning: Adaptive Retrieval for Large Reasoning Models
From Soliloquy to Agora: Memory-Enhanced LLM Agents with Decentralized Debate for Optimization Modeling
PolyKV: A Shared Asymmetrically-Compressed KV Cache Pool for Multi-Agent LLM Inference
LLM-Guided Agentic Floor Plan Parsing for Accessible Indoor Navigation of Blind and Low-Vision People
Week of Apr 20, 2026
Learning to Communicate: Toward End-to-End Optimization of Multi-Agent Language Systems
CHASM: Unveiling Covert Advertisements on Chinese Social Media
Scalable AI Inference: Performance Analysis and Optimization of AI Model Serving
Bimanual Robot Manipulation via Multi-Agent In-Context Learning
DR-Venus: Towards Frontier Edge-Scale Deep Research Agents with Only 10K Open Data
ShadowPEFT: Shadow Network for Parameter-Efficient Fine-Tuning
Harmful Intent as a Geometrically Recoverable Feature of LLM Residual Streams
A multimodal and temporal foundation model for virtual patient representations at healthcare system scale
Latent Phase-Shift Rollback: Inference-Time Error Correction via Residual Stream Monitoring and KV-Cache Steering
Week of Apr 13, 2026
AIPC: Agent-Based Automation for AI Model Deployment with Qualcomm AI Runtime
AgentGA: Evolving Code Solutions in Agent-Seed Space
Don't Retrieve, Navigate: Distilling Enterprise Knowledge into Navigable Agent Skills for QA and RAG
From Anchors to Supervision: Memory-Graph Guided Corpus-Free Unlearning for Large Language Models
Bridging MARL to SARL: An Order-Independent Multi-Agent Transformer via Latent Consensus
Nemotron 3 Super: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning
Policy-Invisible Violations in LLM-Based Agents
AutoSurrogate: An LLM-Driven Multi-Agent Framework for Autonomous Construction of Deep Learning Surrogate Models in Subsurface Flow
ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents
The Salami Slicing Threat: Exploiting Cumulative Risks in LLM Systems
Week of Apr 6, 2026
KnowU-Bench: Towards Interactive, Proactive, and Personalized Mobile Agent Evaluation
KV Cache Offloading for Context-Intensive Tasks
SkillClaw: Let Skills Evolve Collectively with Agentic Evolver
Don't Overthink It: Inter-Rollout Action Agreement as a Free Adaptive-Compute Signal for LLM Agents
LegoDiffusion: Micro-Serving Text-to-Image Diffusion Workflows
Small Vision-Language Models are Smart Compressors for Long Video Understanding
Dynamic Attentional Context Scoping: Agent-Triggered Focus Sessions for Isolated Per-Agent Steering in Multi-Agent LLM Orchestration
More Capable, Less Cooperative? When LLMs Fail At Zero-Cost Collaboration
DIVERSED: Relaxed Speculative Decoding via Dynamic Ensemble Verification
GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents
Gym-Anything: Turn any Software into an Agent Environment
AgentOpt v0.1 Technical Report: Client-Side Optimization for LLM-Based Agent
LatentAudit: Real-Time White-Box Faithfulness Monitoring for Retrieval-Augmented Generation with Verifiable Deployment
SkillX: Automatically Constructing Skill Knowledge Bases for Agents
Week of Mar 30, 2026
Learning to Learn-at-Test-Time: Language Agents with Learnable Adaptation Policies
MOON3.0: Reasoning-aware Multimodal Representation Learning for E-commerce Product Understanding
Learning to Play Blackjack: A Curriculum Learning Perspective
Mimosa Framework: Toward Evolving Multi-Agent Systems for Scientific Research
CirrusBench: Evaluating LLM-based Agents Beyond Correctness in Real-World Cloud Service Environments
Courtroom-Style Multi-Agent Debate with Progressive RAG and Role-Switching for Controversial Claim Verification
Marco DeepResearch: Unlocking Efficient Deep Research Agents via Verification-Centric Design
SkinGPT-X: A Self-Evolving Collaborative Multi-Agent System for Transparent and Trustworthy Dermatological Diagnosis
Doctorina MedBench: End-to-End Evaluation of Agent-Based Medical AI
Week of Mar 23, 2026
AD-CARE: A Guideline-grounded, Modality-agnostic LLM Agent for Real-world Alzheimer's Disease Diagnosis with Multi-cohort Assessment, Fairness Analysis, and Reader Study
WebTestBench: Evaluating Computer-Use Agents towards End-to-End Automated Web Testing
UI-Voyager: A Self-Evolving GUI Agent Learning via Failed Experience
The Price Reversal Phenomenon: When Cheaper Reasoning Models End Up Costing More
Self-Distillation for Multi-Token Prediction
VTAM: Video-Tactile-Action Models for Complex Physical Interaction Beyond VLAs
MsFormer: Enabling Robust Predictive Maintenance Services for Industrial Devices
SecureBreak -- A dataset towards safe and secure models
AI Token Futures Market: Commoditization of Compute and Derivatives Contract Design
Efficient Zero-Shot AI-Generated Image Detection
PRISM: Breaking the O(n) Memory Wall in Long-Context LLM Inference via O(1) Photonic Block Selection
Week of Mar 16, 2026
Memento-Skills: Let Agents Design Agents
Governed Memory: A Production Architecture for Multi-Agent Workflows
Lightweight Adaptation for LLM-based Technical Service Agent: Latent Logic Augmentation and Robust Noise Reduction
MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild
Is Conformal Factuality for RAG-based LLMs Robust? Novel Metrics and Systematic Insights
Evaluating Agentic Optimization on Large Codebases
MAC: Multi-Agent Constitution Learning
CUBE: A Standard for Unifying Agent Benchmarks
The PokeAgent Challenge: Competitive and Long-Context Learning at Scale
Intelligent Co-Design: An Interactive LLM Framework for Interior Spatial Design via Multi-Modal Agents
AgentTrace: Causal Graph Tracing for Root Cause Analysis in Deployed Multi-Agent Systems
Week of Mar 9, 2026
Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections
Automatic Generation of High-Performance RL Environments
XSkill: Continual Learning from Experience and Skills in Multimodal Agents
Slow-Fast Inference: Training-Free Inference Acceleration via Within-Sentence Support Stability
Think While Watching: Online Streaming Segment-Level Memory for Multi-Turn Video Reasoning in Multimodal Large Language Models
CreativeBench: Benchmarking and Enhancing Machine Creativity via Self-Evolving Challenges
When OpenClaw Meets Hospital: Toward an Agentic Operating System for Dynamic Clinical Workflows
OSCBench: Benchmarking Object State Change in Text-to-Video Generation
RoboClaw: An Agentic Framework for Scalable Long-Horizon Robotic Tasks
One Supervisor, Many Modalities: Adaptive Tool Orchestration for Autonomous Queries
COMIC: Agentic Sketch Comedy Generation
LookaheadKV: Fast and Accurate KV Cache Eviction by Glimpsing into the Future without Generation
Nurture-First Agent Development: Building Domain-Expert AI Agents Through Conversational Knowledge Crystallization
Does LLM Alignment Really Need Diversity? An Empirical Study of Adapting RLVR Methods for Moral Reasoning
Resource-constrained Amazons chess decision framework integrating large language models and graph attention
OpenClaw-RL: Train Any Agent Simply by Talking
Context Engineering: From Prompts to Corporate Multi-Agent Architecture
Latent World Models for Automated Driving: A Unified Taxonomy, Evaluation Framework, and Open Challenges
From Days to Minutes: An Autonomous AI Agent Achieves Reliable Clinical Triage in Remote Patient Monitoring
Meissa: Multi-modal Medical Agentic Intelligence
Tool Receipts, Not Zero-Knowledge Proofs: Practical Hallucination Detection for AI Agents
PostTrainBench: Can LLM Agents Automate LLM Post-Training?
SplitAgent: A Privacy-Preserving Distributed Architecture for Enterprise-Cloud Agent Collaboration
Ares: Adaptive Reasoning Effort Selection for Efficient LLM Agents
Week of Mar 2, 2026
HLER: Human-in-the-Loop Economic Research via Multi-Agent Pipelines for Empirical Discovery
SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via Continuous Integration
Light
Dark