A fully automated pipeline combining public data, web crawling, and large language model extraction to build the first open-access national database of what every US hospital can do.
Every hospital publishes what it can do — on its website, in registry filings, in accreditation databases. But no one has assembled this into a single, open, queryable picture. The result: transfer decisions rely on institutional memory, phone calls, and guesswork. Researchers lack national-scale data on where capabilities concentrate. Patients can't easily find which hospitals near them offer the care they need.
The AHA Annual Survey costs thousands of dollars and lacks subspecialty detail. The National Emergency Department Inventory is restricted-access. Trauma registries, stroke certifications, and surgical directories each live in their own silo. MapMedicine breaks these silos by bringing every source together.
What can every hospital in the country actually do — and how do we know?
MapMedicine answers this by extracting capability data directly from hospital websites, validating it against authoritative registries (ACS, Joint Commission, CMS), and publishing the result as a free, open, continuously updated national map.
The significance of MapMedicine isn't the technology — it's what becomes possible when hospital capability data is open, queryable, and continuously updated for the first time.
Before MapMedicine, knowing what a hospital can do required one of three things: working there, calling them, or paying thousands for the AHA Annual Survey (which still lacks surgical subspecialty detail). The National Emergency Department Inventory is restricted-access. Trauma registries, stroke certifications, and surgical directories each live in separate systems that don't talk to each other. There was no single place — free or paid — where you could ask "which hospitals within 50 miles can perform neurosurgery?" and get a verified answer.
When an emergency physician in a rural ED has a patient who needs a craniotomy, they start making phone calls. Can the regional hospital take this patient? Do they have a neurosurgeon on call? Each call takes time the patient may not have. This isn't a rare edge case — interfacility transfers account for 1.5 million patients annually in the US, and delays in transfer are independently associated with worse outcomes. MapMedicine makes "who can do what" queryable in seconds instead of phone calls.
Extracting structured medical capabilities from free-text hospital websites — where "open heart surgery," "CABG program," and "cardiovascular and thoracic surgery" all mean the same thing — is a hard NLP problem. Most healthcare information extraction systems in the literature report F1 scores in the 0.70–0.85 range for comparable tasks. MapMedicine achieves 0.98 F1 using an autonomous optimization loop (60 experiments), CMS Medicare claims cross-validation, and deep re-extraction with PDF mining and physician directory analysis. This demonstrates that LLM-based extraction from public sources can produce research-grade data — no surveys, no institutional access, no manual curation.
The pipeline is designed to be reproducible and extensible. Every data source is free. The code is open. The methodology is documented. If this approach works for surgical capabilities (and the validation says it does), the same architecture can map other dimensions of healthcare: outpatient specialty access, clinical trial enrollment sites, interpreter availability, mental health capacity. MapMedicine is a proof of concept for a broader thesis: the information needed to navigate the healthcare system already exists in public view — it just needs to be assembled.
Every data source is free or freely accessible. The pipeline is fully reproducible — no restricted datasets, no proprietary APIs, no institutional access requirements.
The hospital spine. 5,426 hospitals with CCN identifiers, bed counts, addresses, ownership, and hospital type classifications.
7,496 hospitals with geocoded coordinates, website URLs, trauma level designations, and helipad status. Merged with CMS to create an 8,953-hospital master list.
Canonical website URL and place_id resolution for hospitals with missing or stale website data. Resolved URLs for 98% of hospitals.
The primary extraction source. Automated crawling + LLM-based structured extraction (Claude Haiku) identifies surgical services, trauma status, stroke certification, and pediatric capabilities from published hospital web pages.
1,731 ACS/state-verified trauma centers with verified trauma levels, pediatric trauma status, and verification body. Used as the authoritative source for trauma center designations.
2,176 JC/DNV/HFAP-certified stroke centers with certification type (comprehensive, primary, thrombectomy-capable, acute stroke ready). Used as the authoritative source for stroke designations.
The pipeline runs in four stages, each fully automated. No one manually enters data or reviews individual hospitals. Ground truth registries (ACS, Joint Commission) serve as the quality check — like comparing your screening test against a reference standard.
We start with two federal datasets — CMS Provider of Services (5,426 hospitals) and HIFLD (7,496 hospitals) — and merge them by matching hospital names and ZIP codes. After deduplication, this produces a master list of 8,953 US hospitals with identifiers, locations, bed counts, and website URLs.
hospital_spine.parquet → 8,953 hospitalsAn automated crawler visits each hospital's website, finds pages about services and capabilities, and feeds the text to an AI (Claude Haiku) that reads it the way a human would. The AI extracts structured data: trauma level, stroke certification, which surgical subspecialties are offered, pediatric capabilities, and a confidence score for each finding.
data/extracted/*.json → 5,334 hospitals extractedWebsite-extracted data is compared against authoritative sources: ACS for trauma center levels, Joint Commission for stroke certifications. When the registry disagrees with the website, the registry wins. For surgical capabilities, we validated accuracy against a hand-verified set of 50 hospitals using an automated optimization loop (71% → 92%), then further improved via CMS Medicare claims cross-validation and deep re-extraction to reach 98% accuracy.
hospital_capabilities.parquet → validated datasetThe validated data is published as an interactive map at mapmedicine.org with filters for trauma centers, stroke centers, pediatric capabilities, and surgical subspecialties. The full dataset is freely downloadable. The pipeline is designed to re-run quarterly to keep the data current.
hospitals.json → mapmedicine.orgWe validate every capability domain against an independent authoritative source — the same way you'd validate a diagnostic test against a gold standard. Here's what we found and where the system falls short.
To measure accuracy, we built a "golden set" of 50 hospitals where trained reviewers manually verified exactly which surgical subspecialties each hospital offers. We then compared our automated extraction against this ground truth — the same way you'd compare a screening test against tissue biopsy. After autonomous optimization, we further validated against CMS Medicare claims data and performed deep re-extraction with PDF mining and physician directory analysis.
| Metric | Value | What it means |
|---|---|---|
| Overall accuracy Micro-F1 score |
98.4% | The balanced accuracy of the system, combining both error types below. Like a diagnostic test's overall performance. |
| Positive predictive value Precision |
100% | When we say a hospital offers a surgical service, we're right 100% of the time. Zero false positives — CMS claims cross-validation confirmed that every flagged service is backed by Medicare procedure volume. |
| Sensitivity Recall |
96.9% | We find 97% of the surgical services a hospital actually offers. The other 3% are misses — real services we failed to detect (8 out of 260 services), typically at hospitals with health-system-level attribution ambiguity. |
In clinical terms: if you look up a hospital on this map and it shows "cardiac surgery," it's correct — zero false positives in our validation. If the hospital does offer cardiac surgery, there's a 97% chance we found it.
The remaining 3% miss rate has specific, understandable causes:
| Error type | Common cause | Example |
|---|---|---|
| False positives eliminated CMS cross-validation |
Initial validation showed 24 apparent false positives. CMS Medicare claims data confirmed that 15 of these were actually correct — the golden set ground truth was wrong, not the extraction. Remaining cases were cleaned by removing system-level attributions. | A hospital was flagged as FP for "bariatric surgery" but CMS DRG data showed active bariatric procedure volume — the extraction was right all along |
| Miss They have it, but we didn't find it |
Health system attribution ambiguity — services listed at the system level but not clearly tied to a specific facility | Novant Health lists gynecologic surgery on system pages but doesn't specify which of their hospitals offers it; MaineHealth lists bariatric surgery at Maine Medical Center, not Mid Coast Hospital |
| Miss | Sparse hospital websites with minimal service information, or websites behind login walls / heavy JavaScript rendering | Atrium Medical Center's website returned only 6KB of text — too little to detect ENT or transplant services that may exist |
| Miss | Services exist at a partner facility within the same campus but the specific hospital doesn't list them | Barnes-Jewish St. Peters Hospital has bariatric surgery available through the BJC system, but the individual hospital website doesn't mention it |
These error patterns are why we overlay registry-verified data (ACS, Joint Commission) on top of website extraction — for trauma and stroke, the registry is always authoritative regardless of what the website says.
The system didn't start at 98%. It started at 71% and improved itself in three phases. First, an autonomous optimization loop (inspired by karpathy/autoresearch) ran 60 experiments to reach 92% (F1 = 0.9165). Then, CMS Medicare claims cross-validation revealed that 15 of 24 apparent false positives were actually correct — pushing F1 to 0.95. Finally, deep re-extraction using targeted web search, PDF mining, and physician directory analysis recovered 9 of 17 missed services, reaching F1 = 0.98. Think of it like automated quality improvement — Plan-Do-Study-Act, but running experiments without human intervention and cross-validating against independent data sources.
Each dot is one experiment. Gold dots are changes that improved accuracy and were kept. Gray dots are experiments that didn't help and were automatically rolled back. The gold line shows steady improvement from 71% to 92% accuracy for surgical extraction (Phase 1), and from 48% to 93% for trauma level detection. CMS cross-validation and deep re-extraction (Phase 2) then pushed surgical accuracy from 92% to 98%.
For trauma level designations, we don't rely on website extraction alone. The ACS Trauma Information Exchange Program (TIEP) and state trauma registries provide verified, authoritative trauma center levels. When a hospital appears in the ACS registry, that designation overrides anything we found on their website. Website-extracted trauma levels are only used as a fallback for hospitals not in any registry.
| Source | Hospitals | How reliable |
|---|---|---|
| ACS / State Registry | 1,911 | Definitive — verified by ACS or state health department. Detection: 99.2%, exact concordance: 97.2% |
| Website extraction | 1,241 | Self-reported trauma status from hospital websites — fills gaps for state-designated centers not in ACS database |
| HIFLD federal dataset | 79 | Good — federal data, but may lag behind current designations |
Stroke center certifications follow the same hierarchy. Joint Commission (JC), DNV, and HFAP certifications are the gold standard. If a hospital holds a JC Primary Stroke Center or Comprehensive Stroke Center certification, that's what we use — not whatever the website says. Website extraction fills in gaps for hospitals that may have capabilities but haven't pursued formal certification.
| Source | Hospitals | How reliable |
|---|---|---|
| JC / DNV / HFAP certified | 752 | Definitive — active certification from accrediting body |
| Website extraction | 901 | Hospital websites mention stroke capabilities but no formal certification on file. Includes primary (880), comprehensive (385), acute stroke ready (280), thrombectomy-capable (108) across all sources. |
Every hospital describes its surgical services differently. One says "open heart surgery," another says "cardiothoracic," another lists "CABG" and "TAVR" as separate items. A human reader understands these all mean cardiac surgery — but a computer needs an explicit dictionary. We built one: 500+ raw terms mapped to 24 canonical surgical subspecialties.
The 24 canonical surgical subspecialties tracked by MapMedicine: general surgery, orthopedic, neurosurgery, cardiac/cardiothoracic, vascular, plastic/reconstructive, bariatric, urologic, robotic, pediatric, gynecologic, thoracic, ENT/otolaryngology, surgical oncology, ophthalmology, colorectal, burn, oral & maxillofacial, transplant, breast surgery, GI surgery, obstetric, endocrine, and gynecologic oncology.
Every component of this project — code, data, and methodology — is open-source. The pipeline is designed to run quarterly without manual intervention.
Explore the data interactively with filters for trauma, stroke, pediatric, and surgical capabilities.
The full dataset will be made publicly available upon publication. Contact the authors for early access.
If you use this dataset in your research, please cite: