Machine learning does not outperform traditional statistical modelling for kidney allograft failure prediction

Agathe Truchot, Marc Raynaud, Nassim Kamar, Maarten Naesens, Christophe Legendre, Michel Delahousse, Olivier Thaunat, Matthias Buchler, Marta Crespo, Kamilla Linhares, Babak J. Orandi, Enver Akalin, Gervacio Soler Pujol, Helio Tedesco Silva, Gaurav Gupta, Dorry L. Segev, Xavier Jouven, Andrew J. Bentall, Mark D. Stegall, Carmen LefaucheurOlivier Aubert, Alexandre Loupy

Research output: Contribution to journalArticlepeer-review


Machine learning (ML) models have recently shown potential for predicting kidney allograft outcomes. However, their ability to outperform traditional approaches remains poorly investigated. Therefore, using large cohorts of kidney transplant recipients from 14 centers worldwide, we developed ML-based prediction models for kidney allograft survival and compared their prediction performances to those achieved by a validated Cox-Based Prognostication System (CBPS). In a French derivation cohort of 4000 patients, candidate determinants of allograft failure including donor, recipient and transplant-related parameters were used as predictors to develop tree-based models (RSF, RSF-ERT, CIF), Support Vector Machine models (LK-SVM, AK-SVM) and a gradient boosting model (XGBoost). Models were externally validated with cohorts of 2214 patients from Europe, 1537 from North America, and 671 from South America. Among these 8422 kidney transplant recipients, 1081 (12.84%) lost their grafts after a median post-transplant follow-up time of 6.25 years (Inter Quartile Range 4.33-8.73). At seven years post-risk evaluation, the ML models achieved a C-index of 0.788 (95% bootstrap percentile confidence interval 0.736-0.833), 0.779 (0.724-0.825), 0.786 (0.735-0.832), 0.527 (0.456-0.602), 0.704 (0.648-0.759) and 0.767 (0.711-0.815) for RSF, RSF-ERT, CIF, LK-SVM, AK-SVM and XGBoost respectively, compared with 0.808 (0.792-0.829) for the CBPS. In validation cohorts, ML models’ discrimination performances were in a similar range of those of the CBPS. Calibrations of the ML models were similar or less accurate than those of the CBPS. Thus, when using a transparent methodological pipeline in validated international cohorts, ML models, despite overall good performances, do not outperform a traditional CBPS in predicting kidney allograft failure. Hence, our current study supports the continued use of traditional statistical approaches for kidney graft prognostication.

Original languageEnglish (US)
Pages (from-to)936-948
Number of pages13
JournalKidney international
Issue number5
StatePublished - May 2023


  • artificial intelligence
  • prediction
  • transplantation

ASJC Scopus subject areas

  • Nephrology


Dive into the research topics of 'Machine learning does not outperform traditional statistical modelling for kidney allograft failure prediction'. Together they form a unique fingerprint.

Cite this