Unsupervised deep learning of electrocardiograms enables scalable human disease profiling.
Summary
A denoising autoencoder learned ECG latent representations that associated with 645 prevalent and 606 incident Phecodes across three external datasets, with enrichment in circulatory, respiratory, and endocrine/metabolic diseases. The strongest association was with hypertension, demonstrating phenome-scale diagnostic signal embedded in ECG waveforms.
Key Findings
- A denoising autoencoder generated ECG latent encodings associated with 645 prevalent and 606 incident Phecodes.
- Associations were most enriched in circulatory (82% of category-specific Phecodes), respiratory (62%), and endocrine/metabolic (45%) categories.
- Hypertension showed the strongest ECG association across the phenome.
- Findings were meta-analyzed across three datasets separate from model development.
Clinical Implications
ECG embeddings could augment screening for hypertension and multimorbidity, prioritize diagnostic workups, and enable longitudinal disease surveillance from standard ECGs.
Why It Matters
Provides a scalable, generalizable method to extract disease-relevant signals from routine ECGs across the phenome, enabling low-cost population screening and risk stratification.
Limitations
- Observational design with potential confounding and reliance on EHR-derived Phecodes
- Model interpretability and causal inference are limited; generalizability to other health systems requires validation
Future Directions
Prospective validation for targeted screening, fairness/performance audits across demographics, and integration into clinical workflows for triage and surveillance.
Study Information
- Study Type
- Cohort
- Research Domain
- Diagnosis
- Evidence Level
- II - Prospective/retrospective cohort analyses with meta-analysis across multiple datasets
- Study Design
- OTHER