Prediction model for type 2 diabetes mellitus and its association with mortality using machine learning in three independent cohorts from South Korea, Japan, and the UK: a model development and validation study.

EClinicalMedicine•2025-02-03•PubMed

Total: 78.5Innovation: 7Impact: 8Rigor: 8Citation: 9

Summary

Using >13 million participants across South Korea, Japan, and the UK, an ensemble ML model predicted 5-year T2DM with AUROC 0.792 and externally validated performance. SHAP identified age, fasting glucose, hemoglobin, GGT, and BMI as top contributors, and higher model risk tertiles were associated with progressively greater post-T2DM mortality.

Key Findings

Ensemble ML (logistic regression + AdaBoost voting) achieved AUROC 0.792 and balanced accuracy 72.6% in the discovery cohort.
External validation in Japan (n=12,143,715) and UK (n=416,656) reproduced risk gradients for mortality across model tertiles.
Top SHAP features: age, fasting glucose, hemoglobin, γ‑glutamyl transferase, and BMI.

Clinical Implications

The model could support targeted screening and intensive prevention for high-risk individuals in routine health checks; deployment requires local calibration and impact evaluation.

Why It Matters

First large-scale, externally validated ML model spanning Asia and Europe that not only predicts T2DM but also stratifies mortality risk, enabling risk-informed preventive strategies.

Limitations

Observational design with potential residual confounding and healthcare system differences across countries
Feature set limited to 18 routine variables; AUROC <0.80 may limit individual-level precision

Future Directions

Prospective impact trials to test model-guided prevention, local recalibration, and integration with polygenic and metabolomic markers to boost accuracy.

Study Information

Study Type: Cohort
Research Domain: Prognosis
Evidence Level: III - Large observational cohort model development with external validation
Study Design: OTHER