Prediction of the retention time of natural product metabolites using transfer learning strategies

Κατσάρα, Βασιλική

Πρόβλεψη του χρόνου κατακράτησης μεταβολιτών φυσικών προϊόντων με χρήση στρατηγικών μεταφοράς μάθησης

Διπλωματική εργασία

Συγγραφέας

Κατσάρα, Βασιλική

Ημερομηνία

2024-10-11

Επιβλέπων

Matsoukas, Minos-Timotheos

Διπλωματική Εργασία (2.524Mb)

Λέξεις-κλειδιά

Metabolomics ; Retention time ; Molecular fingerprints ; Molecular descriptors ; Transfer learning ; Machine learning

Περίληψη

Retention time (RT) prediction in chromatography can play an important role for numerous analytical applications, including drug discovery and environmental monitoring. This study aims to enhance RT prediction accuracy by employing deep learning techniques, particularly focusing on transfer learning to adapt models trained on synthetic compounds acquired by High Pressure Liquid chromatography–Mass Spectrometry (HPLC-MS) to predict RTs for natural products in different chromatographic methods. We utilized the extensive METLIN Small Molecule Retention Time (SMRT) dataset, comprising over 80,000 synthetic compounds, to train a deep neural network (DNN). This model was then fine-tuned on smaller datasets of natural products, from the RepoRT database using a two-stage transfer learning approach. Initially, the DNN’s upper layers were frozen to retain knowledge about high level features while training on the new data. Subsequently, all layers were unfrozen for further training with a reduced learning rate, ensuring both general and unique patterns were captured. Hyperparameter optimization was conducted using Optuna, leveraging a 5-fold nested cross-validation to ensure robust performance. The evaluation metrics that we computed were the Mean Absolute Error (MAE), the Median Absolute Error (MedAE) and Mean Absolute Percentage error (MAPE). Transfer learning was then compared with new trained DNNs directly trained on the RepoRT database and showed that the strategy was successful according to the MAE and MadAE metric, although not according to the MAPE. We decided to remove the outliers and noticed that with the cleared data transfer learning performed better considering all the metrics. In the future, it will be necessary to refine this strategy to improve its performance, either by testing it on the same datasets or by incorporating additional data.

Αριθμός σελίδων

Σχολή

Σχολή Μηχανικών

Ακαδημαϊκό Τμήμα

Τμήμα Μηχανικών Βιοϊατρικής

Γλώσσα

Αγγλικά

Περιγραφή

The research for this thesis was conducted at CEU San Pablo University in Madrid, Spain.

URI

https://polynoe.lib.uniwa.gr/xmlui/handle/11400/7837
http://dx.doi.org/10.26265/polynoe-7669

Συλλογή

Διπλωματικές εργασίες

Εμφάνιση πλήρους εγγραφής

Αναφορά Δημιουργού - Μη Εμπορική Χρήση - Παρόμοια Διανομή 4.0 Διεθνές

Εκτός από όπου επισημαίνεται κάτι διαφορετικό, το τεκμήριο διανέμεται με την ακόλουθη άδεια:
Αναφορά Δημιουργού - Μη Εμπορική Χρήση - Παρόμοια Διανομή 4.0 Διεθνές