Εξόρυξη δεδομένων της νέας ελληνικής γλώσσας και οντολογική δόμησή τους στην πλατφόρμα του Protégé

Κορωναίος, Ιωάννης

dc.contributor.advisor	Papakitsos, Evangelos
dc.contributor.author	Κορωναίος, Ιωάννης
dc.date.accessioned	2024-11-01T08:12:42Z
dc.date.available	2024-11-01T08:12:42Z
dc.date.issued	2024-10-25
dc.identifier.uri	https://polynoe.lib.uniwa.gr/xmlui/handle/11400/7945
dc.identifier.uri	http://dx.doi.org/10.26265/polynoe-7777
dc.description.abstract	Η παρούσα Διπλωματική Εργασία εξετάζει την δυνατότητα άντλησης γλωσσικών δεδομένων της Νέας Ελληνικής Γλώσσας από μια δομημένη πηγή στο διαδίκτυο και την φόρτωση αυτών σε μια οντολογία στην πλατφόρμα Protégé. Ως πηγή χρησιμοποιήθηκε το «Λεξικό της Κοινής Νέας Ελληνικής» του Μανώλη Τριανταφυλλίδη, το οποίο είναι διαθέσιμο σε ηλεκτρονική μορφή στο διαδίκτυο. Η εφαρμογή μας έχει σαν στόχο να συγκεντρώσει όλα τα λήμματα που περιέχονται στο παραπάνω λεξικό σε ένα αρχείο τύπου csv (comma – separated values) και στην συνέχεια, να τα μεταφορτώσει από το αρχείο αυτό σε μια Οντολογία που να είναι συμβατή με την πλατφόρμα Protégé. Εντός της οντολογίας, τα λήμματα θα πρέπει να οργανωθούν σε κλάσεις, ανάλογα με το Μέρος του Λόγου στο οποίο ανήκουν. Επιπλέον, για κάθε λήμμα πρέπει είναι διακριτά τα εκάστοτε επιμέρους χαρακτηριστικά του. Η εφαρμογή μας αναπτύσσεται σε τρία scripts, για λόγους ευκολίας στην διαχείριση του όγκου των δεδομένων. Συνεπώς, έχουμε: 1. Άντληση λημμάτων σε ένα αρχείο .csv Η άντληση των λημμάτων πρέπει να γίνει με τέτοιο τρόπο, ώστε να είναι διακριτό το Μέρος του Λόγου στο οποίο ανήκουν, αλλά ταυτόχρονα να καταχωρούνται ξεχωριστά και διάφορα επιμέρους χαρακτηριστικά τους όπως συνώνυμες ή αντώνυμες λέξεις, ετυμολογία, προφορά κ.ά. 2. Δημιουργία της Οντολογίας Μέσω κώδικα θα πρέπει να δημιουργήσουμε μια Οντολογία σε πρότυπο owl, το οποίο μπορεί να αναγνωστεί από το Protégé. Η οντολογία μας, θα περιλαμβάνει μια κύρια κλάση, την «Μέρη_του_Λόγου/Parts_of_Speech» και στη συνέχεια υποκλάσεις σε διάφορα επίπεδα, αναπαριστώντας τα Μέρη του Λόγου όπως περιέχονται στην Γραμματική της Νέας Ελληνικής, καθώς και τις μεταξύ τους σχέσεις. Επίσης, θα δημιουργηθούν κλάσεις που θα απεικονίζουν τα επιμέρους χαρακτηριστικά κάθε λήμματος, όπως αυτά αναφέρονται παραπάνω. Οι κλάσεις αυτές θα είναι μέρος των Data Properties και των Annotation Properties της οντολογίας. 3. Μεταφόρτωση από το αρχείο .csv στην Οντολογία Το τρίτο script της εφαρμογής μας θα αντλεί από το csv τα λήμματα οργανωμένα και τα χαρακτηριστικά αυτών και θα τα φορτώνει στην Οντολογία του προηγούμενου βήματος, φροντίζοντας για την ορθή καταχώρησή τους στις υπάρχουσες κλάσεις. Ο χωρισμός των γλωσσικών όρων και η ταξινόμηση των επιμέρους ιδιοτήτων τους θα γίνει βάσει των καταχωρημένων στο csv στοιχείων. Πρέπει να αναφερθεί πως η παρούσα εργασία αποτελεί ένα μέρος της υποστήριξης της διδακτορικής διατριβής της κας Νικολέττας Σαμαρείδη. Τα scripts που αναπτύχθηκαν είναι τα 1.LexikoOrganosi.py, 2.Dimioyrgia_Ontologias.py και 3.Prosthiki_Individuals, τα οποία και παρατίθενται στο παράρτημα της εργασίας.	el
dc.format.extent	53	el
dc.language.iso	el	el
dc.publisher	Πανεπιστήμιο Δυτικής Αττικής	el
dc.rights	Αναφορά Δημιουργού - Μη Εμπορική Χρήση - Παρόμοια Διανομή 4.0 Διεθνές	*
dc.rights.uri	https://creativecommons.org/licenses/by-nc-sa/4.0/deed.el	*
dc.subject	Εξόρυξη δεδομένων	el
dc.subject	Python	el
dc.subject	Οντολογία	el
dc.subject	OWL	el
dc.subject	Protégé	el
dc.subject	Νέα ελληνική γλώσσα	el
dc.subject	Γλωσσικές τεχνολογίες	el
dc.subject	Data mining	el
dc.subject	Σημασιολογικός ιστός	el
dc.title	Εξόρυξη δεδομένων της νέας ελληνικής γλώσσας και οντολογική δόμησή τους στην πλατφόρμα του Protégé	el
dc.title.alternative	Data retrieval for modern greek language and their ontological structuring in PROTÉGÉ platform	el
dc.type	Διπλωματική εργασία	el
dc.contributor.committee	Drosos, Christos
dc.contributor.committee	Laskaris, Nikolaos
dc.contributor.faculty	Σχολή Μηχανικών	el
dc.contributor.department	Τμήμα Μηχανικών Βιομηχανικής Σχεδίασης και Παραγωγής	el
dc.description.abstracttranslated	The present Diploma Thesis examines the possibility of extracting linguistic data of the Modern Greek Language from a structured online source and loading them into an ontology on the Protégé platform. The source used was the "Dictionary of Standard Modern Greek" by Manolis Triantafyllidis, which is available in electronic form on the internet. Our application aims to collect all the entries contained in the above dictionary into a CSV (comma-separated values) file, and then upload them from this file into an ontology that is compatible with the Protégé platform. Within the ontology, the entries must be organized into classes, depending on the Part of Speech to which they belong. Additionally, each entry's individual characteristics will also be clearly distinguished. For the sake of simplicity, we deemed it necessary for the application to consist of three (3) scripts, making the steps clearer both during the implementation and the presentation of the work. Therefore, the work is developed in the following three applications: Our application is developed in three scripts for easier management of the large volume of data. Thus, we have: 1. Extracting entries into a .csv file The extraction of the entries must be done in such a way that the Part of Speech to which they belong is clear, while at the same time, various individual characteristics, such as synonyms or antonyms, etymology, pronunciation, etc., are recorded separately. 2. Creation of the Ontology Through code, we need to create an ontology in the OWL format, which can be recognized by Protégé. Our ontology will include a main class, PartsOfSpeech, followed by subclasses at various levels, representing the Parts of Speech as described in the Grammar of Modern Greek, as well as the relationships between them. In addition, classes will be created to represent the individual characteristics of each entry, as mentioned above. These classes will be part of the Data Properties and Annotation Properties of the ontology. 3. Uploading from the .csv file into the Ontology The third script of our application will extract the organized entries and their characteristics from the CSV and load them into the ontology created in the previous step, ensuring their proper categorization into the existing classes. The division of linguistic terms and the classification of their individual properties will be done based on the information stored in the CSV. It should be noted that this work is part of the support for the doctoral dissertation of Mrs. Nikoletta Samaridi. The scripts that were developed are: 1. LexikoOrganosi.py, 2. Dimioyrgia_Ontologias.py, and 3. Prosthiki_Individuals, which are provided in the appendix of this thesis.	el

Files in this item

Name:: Koronaios_71446794.pdf
Size:: 2.039Mb
Format:: PDF
Description:: Διπλωματική Εργασία

View/Open

Name:: Koronaios_71446794(2).pdf
Size:: 79.16Kb
Format:: PDF
Description:: Η Δήλωση Συγγραφέα μεμονωμένα, ...

View/Open

This item appears in the following Collection(s)

Διπλωματικές εργασίες
Διπλωματικές εργασίες τμήματος Μηχανικών Βιομηχανικής Σχεδίασης και Παραγωγής

Show simple item record

Except where otherwise noted, this item's license is described as
Αναφορά Δημιουργού - Μη Εμπορική Χρήση - Παρόμοια Διανομή 4.0 Διεθνές