arXiv 2605.20815v1May 20, 2026

GraphRAG on Consumer Hardware: Benchmarking Local LLMs for Healthcare EHR Schema Retrieval

Peter Fernandes, Ria Kanjilal

Brief context

Publication timing, weekly edition context, and source links for this brief.

Published

May 20, 2026, 7:09 AM

Current score

81

Original paper

The executive brief below is grounded in the source paper and linked back to the arXiv abstract.

Graph-based Retrieval Augmented Generation (GraphRAG) extends retrieval-augmented generation to support structured reasoning over complex corpora, but its reliability under resource-constrained, privacy-sensitive deployments remains unclear. In healthcare, where Electronic Health Record (EHR) data is complex and strictly regulated, reliance on cloud-based large language models (LLMs) introduces challenges in cost, latency, and compliance. In this work, we present a systematic evaluation of GraphRAG for EHR schema retrieval using locally deployed open-source LLMs. We implement the Microsoft GraphRAG pipeline on real-world EHR schema documentation and benchmark four models, including Llama 3.1 (8B), Mistral (7B), Qwen 2.5 (7B), and Phi-4-mini (3.8B), each deployed via Ollama on a single consumer GPU (8 GB VRAM). We evaluate indexing efficiency, knowledge graph construction, query latency, answer quality, and hallucination under both global and local retrieval modes. Our results reveal substantial differences: Llama 3.1 produces the richest knowledge graph (1,172 entities), Qwen 2.5 achieves the best answer quality (3.3/5), Phi-4-mini fails to complete the pipeline due to structured-output errors, and Mistral exhibits degenerate repetition behavior. We further show that GraphRAG exhibits a practical capacity threshold, where models below approximately 7B parameters fail to reliably produce valid structured outputs and cannot complete the pipeline. In addition, indexing and answer quality are decoupled across models, and local retrieval consistently outperforms global summarization in both latency and factual grounding, with reduced hallucination. These findings demonstrate that GraphRAG is feasible on consumer hardware while highlighting the importance of model selection and retrieval design for robust deployment in regulated settings.

Score 81Full-paper briefmodelsinferenceinfradata

Executive brief

A short business-reader brief that explains why the paper matters now and what to watch or do next.

Why this is worth your attention

GraphRAG for regulated documentation is moving from “cloud-only experiment” toward something a hospital IT team could plausibly pilot on local hardware. The paper shows EHR schema retrieval running on an 8 GB consumer GPU, which matters because it reduces data-egress, API-cost, and compliance friction; the reasonable implication is that some internal knowledge-search workloads may not need hyperscale infrastructure. The catch is that reliability depends sharply on model choice and retrieval design, and the evidence is still a small, manually scored benchmark rather than production validation.

  • If your default assumption is that graph-based retrieval over regulated documentation requires cloud LLMs, this paper is a useful challenge. It shows a working local path on modest hardware, which could change compliance, vendor-risk, and cost conversations for schema search and documentation workflows.
  • Llama 3.1 built the richest graph, but Qwen 2.5 produced the best scored answers despite extracting fewer entities. Buyers and builders should not treat entity count or indexing volume as a proxy for answer quality; test the retrieval mode and final response behavior.
  • The practical question is not just whether a system uses GraphRAG, but whether it defaults to local, entity-neighborhood retrieval when factual grounding matters. In this study, local retrieval was faster and less prone to made-up schema elements than global summarization.
  • The failure modes here are operational: invalid JSON can break the pipeline, and repetition loops can create runaway latency. Any pilot should include hard tests for structured outputs, timeouts, repetition controls, and recovery behavior before it touches regulated workflows.
  • The evidence is useful but narrow: a small curated EHR-schema subset, manual answer scoring, and one consumer workstation. Treat it as a credible pilot signal for local GraphRAG, not proof that the same setup will scale cleanly across a hospital’s full documentation estate.

Evidence ledger

The strongest claims in the brief, along with the confidence and citation depth behind them.

stackhighp.1p.4

Microsoft GraphRAG was run locally on a single consumer GPU with 8 GB VRAM using quantized open-source models.

capabilityhighp.6p.7

Qwen 2.5 delivered the best manual answer-quality score, and local retrieval improved both quality and latency versus global search in the reported setup.

caveathighp.6

Global summarization produced hallucinated table names, while local retrieval stayed more grounded in actual schema entities.

caveathighp.8p.8

The study’s conclusions are constrained by a small curated dataset and manual scoring.

Related briefs

More plain-English summaries from the archive with nearby topics or operator relevance.

cs.CR

Grounded Cache Routing for Retrieval-Augmented Generation: When Is It Safe to Reuse an Answer?

Syed Huma Shah

cs.AI

Cascading Hallucination in Agentic RAG: The CHARM Framework for Detection and Mitigation

Saroj Mishra

cs.CL

The Coverage Illusion: From Pre-retrieval Routing Failure to Post-retrieval Cascades in a Production RAG System

Zafar Hussain, Kristoffer Nielbo

cs.MA

CONCAT: Consensus- and Confidence-Driven Ad Hoc Teaming for Efficient LLM-Based Multi-Agent Systems

Ziyang Ma et al.

Thank you to arXiv for use of its open access interoperability. This product was not reviewed or approved by, nor does it necessarily express or reflect the policies or opinions of, arXiv.
LightDark