Comparison of the Predictive Performance of Medical Coding Diagnosis Classification Systems

Dimitrios Zikos, Nailya DeLellis

Research output: Contribution to journalArticlepeer-review


Health analytics frequently involve tasks to predict outcomes of care. A foundational predictor of clinical outcomes is the medical diagnosis (Dx). The most used expression of medical Dx is the International Classification of Diseases (ICD-10-CM). Since ICD-10-CM includes >70,000 codes, it is computationally expensive and slow to train models with. Alternative lower-dimensionality alternatives include clinical classification software (CCS) and diagnosis-related groups (MS-DRGs). This study compared the predictive power of these alternatives against ICD-10-CM for two outcomes of hospital care: inpatient mortality and length of stay (LOS). Naïve Bayes (NB) and Random Forests models were created for each Dx system to examine their predictive performance for inpatient mortality, and Multiple Linear Regression models for the continuous LOS variable. The MS-DRGs performed highest for both outcomes, even outperforming ICD-10-CM. The admitting ICD-10-CM codes were, surprisingly, not underperformed by the primary ICD-10-CM Dxs. The CCS system, although having a much lower dimensionality than ICD-10-CM, has only slightly lower performance while the refined version of CCS only slightly outperformed the old CCS. Random Forests outperformed NB for MS-DRG, and ICD-10-CM, by a large margin. Results can provide insights to understand the compromise from using lower-dimensionality representations in clinical outcome studies.

Original languageEnglish
Article number122
Issue number6
StatePublished - Dec 2022


  • clinical classification software (CCS)
  • diagnosis-related groups (MS-DRG)
  • length of stay
  • mortality
  • naïve bayes
  • predictive modeling
  • random forests


Dive into the research topics of 'Comparison of the Predictive Performance of Medical Coding Diagnosis Classification Systems'. Together they form a unique fingerprint.

Cite this