Prediction model for type 2 diabetes mellitus and its association with mortality using machine learning in three independent cohorts from South Korea, Japan, and the UK: a model development and validation study.
Summary
Using >13 million participants across South Korea, Japan, and the UK, an ensemble ML model predicted 5-year T2DM with AUROC 0.792 and externally validated performance. SHAP identified age, fasting glucose, hemoglobin, GGT, and BMI as top contributors, and higher model risk tertiles were associated with progressively greater post-T2DM mortality.
Key Findings
- Ensemble ML (logistic regression + AdaBoost voting) achieved AUROC 0.792 and balanced accuracy 72.6% in the discovery cohort.
- External validation in Japan (n=12,143,715) and UK (n=416,656) reproduced risk gradients for mortality across model tertiles.
- Top SHAP features: age, fasting glucose, hemoglobin, γ‑glutamyl transferase, and BMI.
Clinical Implications
The model could support targeted screening and intensive prevention for high-risk individuals in routine health checks; deployment requires local calibration and impact evaluation.
Why It Matters
First large-scale, externally validated ML model spanning Asia and Europe that not only predicts T2DM but also stratifies mortality risk, enabling risk-informed preventive strategies.
Limitations
- Observational design with potential residual confounding and healthcare system differences across countries
- Feature set limited to 18 routine variables; AUROC <0.80 may limit individual-level precision
Future Directions
Prospective impact trials to test model-guided prevention, local recalibration, and integration with polygenic and metabolomic markers to boost accuracy.
Study Information
- Study Type
- Cohort
- Research Domain
- Prognosis
- Evidence Level
- III - Large observational cohort model development with external validation
- Study Design
- OTHER