Predicting the top-level ontological concepts of domain entities using word embeddings, informal definitions, and deep learning

AGL Junior, JL Carbonera, D Schimidt… - Expert Systems with …, 2022 - Elsevier
Expert Systems with Applications, 2022Elsevier
Ontology development is a challenging task that encompasses many time-consuming
activities. One of these activities is the classification of the domain entities (concepts and
instances) according to top-level concepts. This activity is usually performed manually by an
ontology engineer. However, when the set of entities increases in size, associating each
entity to the proper top-level ontological concept becomes challenging and requires a high
level of expertise in both the target domain and ontology engineering. This paper proposes …
Abstract
Ontology development is a challenging task that encompasses many time-consuming activities. One of these activities is the classification of the domain entities (concepts and instances) according to top-level concepts. This activity is usually performed manually by an ontology engineer. However, when the set of entities increases in size, associating each entity to the proper top-level ontological concept becomes challenging and requires a high level of expertise in both the target domain and ontology engineering. This paper proposes a deep learning approach that automatically classifies domain entities into top-level concepts using their informal definitions and the word embedding of the terms that represent them. From these inputs, we feed a deep neural network consisting of two modules: a feed-forward neural network and a bi-directional recurrent neural network with long short-term units. Our architecture combines both outputs of these modules into a dense layer and provides the probabilities of each candidate class. For validating our proposal, we have developed a dataset based on the OntoWordNet ontology, which provides a classification of WordNet synsets into concepts specified by DOLCE-lite-plus top-level ontology. Our experiments show that our proposal outperforms the baseline approaches by 6% regarding the F-score. In addition, our proposal is less affected by the polysemy in the terms that represent the domain entities than the compared approaches. Consequently, our proposal can consider more instances during its training than the baseline methods.
Elsevier
Showing the best result for this search. See all results