Assessment and comparison of existing methods and datasets for sentiment analysis of Greek texts

Φραγκής, Νικόλαος

dc.contributor.advisor	Tselenti, Panagiota
dc.contributor.author	Φραγκής, Νικόλαος
dc.date.accessioned	2022-07-07T12:17:21Z
dc.date.available	2022-07-07T12:17:21Z
dc.date.issued	2022-06-30
dc.identifier.uri	https://polynoe.lib.uniwa.gr/xmlui/handle/11400/2411
dc.identifier.uri	http://dx.doi.org/10.26265/polynoe-2251
dc.description.abstract	Sentiment Αnalysis is a well-known field of Natural Language Processing that is concerned with text classification. There is a vast number of papers, especially for the English language, that present state-of-the-art results on many different datasets using a variety of classification models. The aim of this work is to compare machine learning models on different datasets in both Greek and English. In order to achieve this aim, we used the well-known IMDb dataset from Stanford University, which is very often used for the evaluation of new text classification models, and one equivalent new dataset that we created in Greek from the Athinorama website. For our experiments, we used the following models: Logistic Regression, Support Vector Machine, Naïve Bayes, Decision Trees, XGBoost, Convolutional Neural Network, Long Short-Term Memory, Gated Recurrent Units, and Bidirectional Encoder Representations from Transformers (BERT). The first five models were combined with the TF-IDF vectorization technique, while the rest were combined with the Word Embeddings vectorization technique. The results show that the best classifier for sentiment analysis for both English and Greek is the pretrained BERT model. The difference in language does not seem to have a significant impact on the results, whereas the quality, the size, and the level of pre-processing of the data appear to play a significant role in the classification process. The reason we chose to deal with this work is the lack of research for the Greek language and our contribution is the Athinorama Light dataset that could play a significant role in future works for Greek language classification issues.	el
dc.format.extent	67	el
dc.language.iso	en	el
dc.publisher	Πανεπιστήμιο Δυτικής Αττικής	el
dc.publisher	Université de Limoges	el
dc.rights	Αναφορά Δημιουργού - Μη Εμπορική Χρήση - Παρόμοια Διανομή 4.0 Διεθνές	*
dc.subject	Machine learning	el
dc.subject	Sentiment analysis	el
dc.title	Assessment and comparison of existing methods and datasets for sentiment analysis of Greek texts	el
dc.title.alternative	Αποτίμηση και σύγκριση υφιστάμενων εργαλείων, μεθόδων και συνόλων δεδομένων για τη συναισθηματική ανάλυση ελληνικών κειμένων	el
dc.type	Μεταπτυχιακή διπλωματική εργασία	el
dc.contributor.committee	Μαστοροκώστας, Πάρις
dc.contributor.committee	Kesidis, Anastasios
dc.contributor.faculty	Σχολή Μηχανικών	el
dc.contributor.department	Τμήμα Μηχανικών Πληροφορικής και Υπολογιστών	el
dc.contributor.department	Τμήμα Μηχανικών Τοπογραφίας και Γεωπληροφορικής	el
dc.contributor.master	Τεχνητή Νοημοσύνη και Οπτική Υπολογιστική	el

Αρχεία σε αυτό το τεκμήριο

Όνομα:: Διπλωματική Νικόλαος Φραγκής ...
Μέγεθος:: 2.455Mb
Τύπος:: PDF

Προβολή/Άνοιγμα

Αυτό το τεκμήριο εμφανίζεται στις ακόλουθες συλλογές

Μεταπτυχιακές διπλωματικές εργασίες - Τεχνητή Νοημοσύνη και Οπτική Υπολογιστική
Μεταπτυχιακές διπλωματικές εργασίες ΠΜΣ Τεχνητή Νοημοσύνη και Οπτική Υπολογιστική

Εμφάνιση απλής εγγραφής