Skip to main content

Application of a methodological framework for the development and multicenter validation of reliable artificial intelligence in embryo evaluation.

Reproductive biology and endocrinology : RB&E2025-02-01PubMed
Total: 78.5Innovation: 8Impact: 7Rigor: 8Citation: 8

Summary

Across 10 IVF clinics, a deep learning embryo-ranking model showed consistent, externally validated associations between higher AI score brackets and clinical pregnancy (fetal heartbeat), with top-tier scores yielding OR≈3.8–4.0. The authors provide a four-step methodology emphasizing curated datasets, performance assessment across variable data, and explainability via correlations with morphology.

Key Findings

  • AI score brackets showed monotonic increases in fetal heartbeat odds across test and independent datasets (top bracket OR ≈3.84–4.01).
  • Performance generalized across clinics and age subgroups; FH-positive embryos had higher average AI scores within each age stratum.
  • AI scores correlated with established morphologic quality parameters, supporting interpretability.
  • A four-step development/validation framework addressed dataset curation, optimization, performance under data variability, and explainability.

Clinical Implications

Clinics can consider AI scores as an adjunct for embryo selection, given consistent associations with fetal heartbeat across sites. Prospective trials are still needed to confirm improvements in live birth and to guide clinic-specific calibration and governance.

Why It Matters

Provides a transparent, multicenter framework that demonstrates reliable AI performance with external validation, addressing reproducibility and explainability—key barriers to clinical adoption of AI in IVF.

Limitations

  • Non-randomized, retrospective datasets; no direct evidence of improved live birth rates
  • Dependent on time-lapse imaging and specific lab workflows; potential selection biases

Future Directions

Prospective, randomized studies to test impact on live birth; site-level calibration and fairness auditing; integration with clinical decision support and cost-effectiveness analyses.

Study Information

Study Type
Cohort
Research Domain
Diagnosis/Prognosis
Evidence Level
III - Multicenter retrospective cohort/model validation without randomization
Study Design
OTHER