Daily Sepsis Research Analysis

Summary

Three studies advance sepsis detection and implementation science: a multi-cohort evaluation shows routine fine-tuning of deep learning sepsis predictors fails under distribution shift, a prospective multicenter study demonstrates point-of-care pancreatic stone protein (PSP) can identify sepsis and gains high specificity when combined with CRP, and an explainable pediatric abdominal sepsis model achieves strong multicenter generalizability.

Research Themes

AI/ML robustness and deployment for sepsis prediction under distribution shift
Point-of-care biomarkers to improve early sepsis identification
Explainable diagnostics for pediatric abdominal sepsis

Selected Articles

1. Evaluating deep learning sepsis prediction models in ICUs under distribution shift: a multi-centre retrospective cohort study.

83Level IIICohort

NPJ digital medicine · 2026PMID: 41775890

Across 216,536 ICU stays from HiRID, MIMIC-IV, and eICU, routine fine-tuning under distribution shift underperformed compared with retraining, fusion-training, and supervised domain adaptation. Retraining/fusion excelled with small/large target data, while domain adaptation delivered the most stable gains for medium target data, improving AUROC and normalized AUPRC.

Impact: This work challenges the field’s default reliance on fine-tuning and provides data-driven guidance on when to use domain adaptation, retraining, or fusion, directly informing real-world deployment of sepsis prediction models.

Clinical Implications: ICUs should avoid naive fine-tuning when transferring sepsis predictors; instead, select deployment strategies based on target-data availability to improve reliability, reduce false alarms, and support earlier sepsis recognition.

Key Findings

Quantified distribution shifts across HiRID, MIMIC-IV, and eICU (216,536 ICU stays).
Compared five deployment strategies across multiple deep learning architectures and four target-data regimes.
Routine fine-tuning consistently underperformed versus alternatives.
Retraining and fusion-training performed best in small and large target-data regimes.
Supervised domain adaptation yielded the most stable gains in medium target-data settings, improving AUROC and normalized AUPRC.

Methodological Strengths

Large, harmonized multi-cohort evaluation across three major ICU datasets
Systematic benchmarking across architectures, deployment strategies, and data regimes

Limitations

Retrospective design without prospective clinical deployment
Potential cohort-specific labeling and practice differences not fully controllable

Future Directions: Prospective trials integrating domain adaptation and fusion strategies into clinical workflows, with drift monitoring, cost-effectiveness, and impact on sepsis treatment timing and outcomes.

Sepsis prediction models trained on ICU data often fail to generalize under external validation because of distribution shift. Prior studies have focused on direct model deployment or conventional transfer learning methods (e.g., fine-tuning), yet systematic exploration of alternative strategies remains limited. We quantify shifts across three harmonized adult ICU cohorts (HiRID, MIMIC-IV, eICU; 216,536 stays) and compare five deployment strategies: generalization, fine-tuning/retraining, target training, supervised domain adaptation (DA), and fusion-training, across multiple deep learning architectures, and four target-data regimes (none; small ≤ 8k; medium 8-32k; large ≥ 32k stays). Fine-tuning consistently underperforms, even though it has been the go-to method in literature. Retraining and fusion perform best in small and large target data regimes, while DA yields the most stable gains with medium target data, improving AUROC and normalized AUPRC over other methods. These results argue for moving beyond routine fine-tuning for sepsis prediction and selecting strategies by target-data availability and operational context.

2. Diagnostic Performance of Point-of-Care Immunoassay Measurements of Pancreatic Stone Protein for Sepsis Detection in ICU Patients: A Prospective, Multicenter, Biomarker-Blinded Study.

80Level IICohort

Critical care medicine · 2026PMID: 41778855

In six U.S. ICUs, point-of-care PSP testing within the first 3 ICU days showed sensitivity 74.2% and specificity 67.8% at 117 ng/mL. Combining PSP with CRP markedly increased diagnostic specificity to 95.2%, with consistent performance across sex and higher specificity in adults aged 18–60.

Impact: Provides multicenter, prospective, biomarker-blinded evidence that a rapid PSP assay can aid early sepsis detection and that combining PSP with CRP substantially improves specificity.

Clinical Implications: PSP can be incorporated into early sepsis screening, particularly as a rule-in tool when combined with CRP, potentially enabling earlier targeted management and antibiotic stewardship.

Key Findings

At 117 ng/mL, PSP achieved sensitivity 74.2%, specificity 67.8%, accuracy 71.0%.
Combining PSP with CRP increased diagnostic specificity to 95.2%.
Performance was consistent across sex; specificity was higher in 18–60-year-olds.
In febrile patients, specificity was high (87.5%) but sensitivity lower (63.6%).

Methodological Strengths

Prospective, multicenter design with biomarker blinding
Predefined threshold selection using Youden Index with comprehensive diagnostic metrics

Limitations

Observational diagnostic study without interventional outcome assessment
Evaluation limited to the first three ICU days; potential spectrum effects by clinical presentation

Future Directions: Randomized trials of PSP-guided care pathways, integration into sepsis bundles, external validation across diverse ICUs, and cost-effectiveness analyses.

OBJECTIVES: To evaluate the diagnostic performance of a rapid point-of-care immunoassay measuring pancreatic stone protein (PSP) for early sepsis identification within the first three days of ICU admission. Subgroup analyses (sex, age, febrile status) were conducted, and the combined diagnostic value of PSP and C-reactive protein (CRP) was assessed. DESIGN: Multicenter, prospective, observational study. PATIENT: Four hundred sixty-six adults the ICU. SETTING: Six ICUs in the United States who were expected to required at least 24 hours of ICU care. INTERVENTIONS: None. MEASUREMENTS AND MAIN RESULTS: We calculated the Youden Index to evaluate the clinical performance of the PSP assay, and the resulting threshold was used to identify patients with sepsis. Diagnostic performance metrics included sensitivity, specificity, accuracy, positive predictive value (PPV), negative predictive value (NPV), positive likelihood ratio (LR+), and negative likelihood ratio (LR-). Receiver operating characteristic analysis were performed for PSP and CRP. At the optimal PSP cutoff point of 117 ng/mL, PSP demonstrated a sensitivity of 74.2%, specificity of 67.8%, accuracy of 71.0%, PPV of 70.3%, NPV of 71.9%, and LR+ and LR- ratios of 2.30 and 0.38, respectively. Combining PSP and CRP improved diagnostic specificity to 95.2%. Subgroup analyses demonstrated consistent performance across sex, and higher specificity was observed in patients 18-60 years old. In febrile patients, PSP achieved high specificity (87.5%) but lower sensitivity (63.6%). In non-febrile patients, sensitivity and specificity were 67.7% and 76.6%, respectively. CONCLUSIONS: PSP can serve as a biomarker for the early identification of sepsis. Diagnostic performance across diverse ages, sex, and clinical presentation supports the assay's broad applicability. The combination of PSP and CRP enhances diagnostic specificity for sepsis detection, offering a complementary approach to improve sepsis detection and lead to earlier appropriate management.

3. Development and multicenter validation of an explainable machine learning diagnostic criteria for pediatric abdominal sepsis.

75.5Level IICohort

NPJ digital medicine · 2026PMID: 41775847

Using 6,566 derivation cases and 308 prospectively enrolled external cases from seven hospitals, the explainable nine-variable ABSeD model achieved AUC 0.934 in training and 0.928 in multicenter validation. Consensus diagnosis and surgical records anchored labels, demonstrating robust generalizability for early pediatric abdominal sepsis detection.

Impact: Addresses a diagnostic blind spot by providing an explainable, generalizable tool for early pediatric abdominal sepsis, supported by multicenter external validation.

Clinical Implications: ABSeD could support earlier recognition and intervention for pediatric intra-abdominal sepsis in hospital settings, potentially reducing delays to surgery or source control.

Key Findings

Developed an explainable nine-variable ABSeD model from 6,566 pediatric admissions.
Prospective multicenter external validation (n=308 across 7 hospitals) achieved AUC 0.928, accuracy 0.873, precision 0.924.
Training performance AUC 0.934 with accuracy 0.870 and precision 0.910.
PAS labels determined by consensus review and laparoscopic surgery records.

Methodological Strengths

Derivation with large real-world dataset and prospective multicenter external validation
Explainable modeling with hyper-parameter optimization and algorithm comparison

Limitations

External validation period was short (January–March 2025), which may limit seasonal representativeness
Single health system derivation may constrain generalizability beyond participating regions

Future Directions: Longer-term, multicountry prospective impact studies assessing workflow integration, fairness across subgroups, and effects on time-to-source-control and clinical outcomes.

Accurate identification of early pediatric abdominal sepsis (PAS) is essential to improving outcomes, yet most existing pediatric sepsis criteria and scoring tools primarily focus on cardiopulmonary dysfunction and overlook early intra-abdominal infections. To address this gap, we combined the real-world data with explainable machine learning to develop the Abdominal Sepsis Diagnosis model (ABSeD) for clinical decision support. The model construction used the retrospective data from 6566 pediatric patients who were admitted to the Children's Hospital, Zhejiang University School of Medicine from 2019 to 2023. Prospective data from 308 recruited patients across seven independent hospitals collected between January and March 2025 served as an external validation cohort. PAS status was determined through consensus or by reviewing laparoscopic surgery records. Multiple machine learning algorithms were compared, and the optimal model was further refined by hyper-parameter tuning. The ABSeD model, integrating nine routine clinical variables, demonstrated high diagnostic accuracy (training set: AUC = 0.934, 95% CI: [0.912, 0.950]; accuracy = 0.870, precision = 0.910), and robust multicenter generalizability (AUC = 0.928, 95% CI: [0.895, 0.961]; accuracy = 0.873, precision = 0.924). This model offers an explainable and practical digital tool for early detection of PAS, with potential to enhance timely intervention in hospitalized children with suspected or clinically identified intra-abdominal septic pathology.