Skip to main content

A methodological systematic review of validation and performance of sepsis real-time prediction models.

NPJ digital medicine2025-04-07PubMed
Total: 77.5Innovation: 8Impact: 8Rigor: 7Citation: 9

Summary

Across 91 studies, only 54.9% used external full-window validation with both model- and outcome-level metrics. Performance declined under external and full-window validation (median AUROC down to 0.783), and Utility Scores dropped from positive in internal validation to negative externally. Hand-crafted features improved performance, and only 18.7% of studies identified top-performing models when jointly considering AUROC and utility.

Key Findings

  • Only 54.9% of studies applied external, full-window validation with both model- and outcome-level metrics.
  • Median AUROC at 6–12 hours pre-onset (0.886/0.861) dropped to 0.783 under full-window external validation.
  • Median Utility Score declined from 0.381 (internal) to −0.164 (external) validation.
  • Hand-crafted features significantly improved model performance.
  • Combining AUROC and Utility identified top-performing SRPMs in only 18.7% of studies.

Clinical Implications

Hospitals should require external full-window validation and utility assessment before deploying sepsis alert systems. Model development should incorporate hand-crafted clinical features and plan for multicenter prospective trials.

Why It Matters

Sets methodological benchmarks for sepsis prediction model validation, highlighting the need for external, full-window, multi-metric evaluation to avoid overestimation. Timely guidance for AI in healthcare.

Limitations

  • Heterogeneity in sepsis definitions, model architectures, and outcome labeling across studies
  • Potential publication and reporting biases; limited number of prospective clinical evaluations

Future Directions

Promote multicenter, prospective, full-window external validation with standardized definitions and utility metrics; develop reporting guidelines tailored for real-time clinical AI in sepsis.

Study Information

Study Type
Systematic Review
Research Domain
Diagnosis
Evidence Level
I - Systematic review across 91 studies with methodological synthesis
Study Design
OTHER