Evaluation of classification methods for the prediction of hospital length of stay using Medicare claims data

Konstantinos Tsiakas, Dimitrios Zikos

Research output: Chapter in Book/Report/Conference proceedingConference contribution


In this paper, we investigate the performance of a series of classification methods for the prediction of the hospital Length of Stay (LOS), based on two temporally sequential clinical scenarios. We used a 2012 Medicare Provider Analysis and Review (MedPar) dataset, which contains records of Medicare beneficiaries who used inpatient hospital services. Our subset included 300,000 randomly selected cases. During the prepossessing we added new features and linked our data with external datasets, using common key identifiers. In the first scenario our goal was to predict the LOS using a subset of information which is readily available to the clinician upon the patient admission, while the second scenario assumes that there is available additional data (information on the patient diagnosis and clinical procedures). For our experiments we used three different classifiers: Naïve Bayes, AdaBoost and C4.5 Decision tree, for two different LOS cut-off points (4 day and 12 day hospital stay). The overall performance of our classifiers was ranging from fair to very good. On the other hand the true positive rate, that is the correct classification of the long hospital stays, was low, with an exception of Naïve Bayes, which demonstrated significantly better performance in the second scenario. Our results indicate that Naïve Bayes may be used for the prediction of the in-hospital LOS. Our analysis also indicates that the MedPar data combined with other data resources has the potential to provide a good basis for robust prediction analytics in hospitals.

Original languageEnglish
Title of host publicationProceedings of the 7th International Conference on PErvasive Technologies Related to Assistive Environments
StatePublished - 2014


Dive into the research topics of 'Evaluation of classification methods for the prediction of hospital length of stay using Medicare claims data'. Together they form a unique fingerprint.

Cite this