Construction and validation of nomograms combined with novel machine learning algorithms to predict early death of patients with metastatic colorectal cancer

Front Public Health. 2022 Dec 20:10:1008137. doi: 10.3389/fpubh.2022.1008137. eCollection 2022.

Abstract

Purpose: The purpose of this study was to investigate the clinical and non-clinical characteristics that may affect the early death rate of patients with metastatic colorectal carcinoma (mCRC) and develop accurate prognostic predictive models for mCRC.

Method: Medical records of 35,639 patients with mCRC diagnosed from 2010 to 2019 were obtained from the SEER database. All the patients were randomly divided into a training cohort and a validation cohort in a ratio of 7:3. X-tile software was utilized to identify the optimal cutoff point for age and tumor size. Univariate and multivariate logistic regression models were used to determine the independent predictors associated with overall early death and cancer-specific early death caused by mCRC. Simultaneously, predictive and dynamic nomograms were constructed. Moreover, logistic regression, random forest, CatBoost, LightGBM, and XGBoost were used to establish machine learning (ML) models. In addition, receiver operating characteristic curves (ROCs) and calibration plots were obtained to estimate the accuracy of the models. Decision curve analysis (DCA) was employed to determine the clinical benefits of ML models.

Results: The optimal cutoff points for age were 58 and 77 years and those for tumor size of 45 and 76. A total of 15 independent risk factors, namely, age, marital status, race, tumor localization, histologic type, grade, N-stage, tumor size, surgery, radiation, chemotherapy, bone metastasis, brain metastasis, liver metastasis, and lung metastasis, were significantly associated with the overall early death rate of patients with mCRC and the cancer-specific early death rate of patients with mCRC, following which nomograms were constructed. The ML models revealed that the random forest model accurately predicted outcomes, followed by logistic regression, CatBoost, XGBoost, and LightGBM models. Compared with other algorithms, the random forest model provided more clinical benefits than other models and can be used to make clinical decisions in overall early death and specific early death caused by mCRC.

Conclusion: ML algorithms combined with nomograms may play an important role in distinguishing early deaths owing to mCRC and potentially help clinicians make clinical decisions and follow-up strategies.

Keywords: SEER; dynamic nomogram; early death; metastatic colorectal cancer; novel machine learning.

Publication types

  • Randomized Controlled Trial

MeSH terms

  • Aged
  • Algorithms
  • Colorectal Neoplasms*
  • Humans
  • Machine Learning
  • Middle Aged
  • Nomograms*
  • Random Forest