A comparative study of machine learning models and dynamic ensemble selection approaches to identify chronic diseases
Συγκριτική μελέτη μοντέλων μηχανικής μάθησης και προσεγγίσεων δυναμικής επιλογής συνόλου για την αναγνώριση χρόνιων ασθενειών
Μεταπτυχιακή διπλωματική εργασία
Author
Μιχαλάκογλου, Αθανάσιος
Date
2024-10-24Keywords
Comparative study ; Chronic diseases ; Diabetes ; Cardiovascular diseases ; Melanoma ; Dynamic ensemble selection ; Machine learning ; Logistic regression ; K-Nearest Neighbor ; Random forest ; SVC ; KNORA-U ; KNORA-E ; META-DES ; KNOP ; DES-P ; DES-KNNAbstract
Chronic diseases (CDs) are the leading cause of death globally, significantly increasing the risk of acute clinical complications and imposing a substantial economic burden on healthcare systems. The most prevalent CDs include respiratory diseases, cancer, cardiovascular diseases (CVDs), and diabetes.
Machine learning (ML) models have demonstrated remarkable success in various clinical tasks, such as predicting hospitalizations, assessing patient mortality risk, and identifying disease risk factors. In recent years, ensemble methods have shown excellent results across multiple domains. Specifically, dynamic ensemble selection (DES) approaches, which automatically select ensemble ML models, have proven highly effective in predictive applications. However, few studies have explored the use of DES for predictive tasks in clinical settings. The objective of this Master’s thesis is to evaluate and compare the predictive performance of various DES approaches in identifying different chronic diseases. To achieve this, several public datasets of patients diagnosed with CDs, including CVDs, diabetes, and skin cancer, were analyzed. Multiple ML models, such as Random Forest Classifier, Logistic Regression, K-Nearest Neighbor and Support Vector Machines, were employed as base learners in the DES framework.
The results demonstrated that KNOP (a dynamic ensemble selection technique being used) consistently outperformed both traditional classifiers and other DES methods across most datasets, particularly excelling in precision and recall, making it the most reliable model for minimizing false positives and false negatives predictions. Traditional classifiers, such as Random Forest and SVC, also showed competitive performance, especially in terms of specificity and accuracy, proving to be robust options in more general settings. The findings suggest that DES techniques, particularly KNOP, offer a balanced and highly effective approach for predictive tasks in chronic disease identification.