A comparative study of machine learning  models and dynamic ensemble selection approaches to identify chronic diseases

Μιχαλάκογλου, Αθανάσιος

Συγκριτική μελέτη μοντέλων μηχανικής μάθησης και προσεγγίσεων δυναμικής επιλογής συνόλου για την αναγνώριση χρόνιων ασθενειών

Μεταπτυχιακή διπλωματική εργασία

Author

Μιχαλάκογλου, Αθανάσιος

Date

2024-10-24

Advisor

Soguero-Ruiz, Cristina
Chushig-Muzo, David

Master Thesis (15.29Mb)

Keywords

Comparative study ; Chronic diseases ; Diabetes ; Cardiovascular diseases ; Melanoma ; Dynamic ensemble selection ; Machine learning ; Logistic regression ; K-Nearest Neighbor ; Random forest ; SVC ; KNORA-U ; KNORA-E ; META-DES ; KNOP ; DES-P ; DES-KNN

Abstract

Chronic diseases (CDs) are the leading cause of death globally, significantly increasing the risk of acute clinical complications and imposing a substantial economic burden on healthcare systems. The most prevalent CDs include respiratory diseases, cancer, cardiovascular diseases (CVDs), and diabetes. Machine learning (ML) models have demonstrated remarkable success in various clinical tasks, such as predicting hospitalizations, assessing patient mortality risk, and identifying disease risk factors. In recent years, ensemble methods have shown excellent results across multiple domains. Specifically, dynamic ensemble selection (DES) approaches, which automatically select ensemble ML models, have proven highly effective in predictive applications. However, few studies have explored the use of DES for predictive tasks in clinical settings. The objective of this Master’s thesis is to evaluate and compare the predictive performance of various DES approaches in identifying different chronic diseases. To achieve this, several public datasets of patients diagnosed with CDs, including CVDs, diabetes, and skin cancer, were analyzed. Multiple ML models, such as Random Forest Classifier, Logistic Regression, K-Nearest Neighbor and Support Vector Machines, were employed as base learners in the DES framework. The results demonstrated that KNOP (a dynamic ensemble selection technique being used) consistently outperformed both traditional classifiers and other DES methods across most datasets, particularly excelling in precision and recall, making it the most reliable model for minimizing false positives and false negatives predictions. Traditional classifiers, such as Random Forest and SVC, also showed competitive performance, especially in terms of specificity and accuracy, proving to be robust options in more general settings. The findings suggest that DES techniques, particularly KNOP, offer a balanced and highly effective approach for predictive tasks in chronic disease identification.