Abstract:
Information Extraction is the process of automatically obtaining knowledge from plain text. Because of the ambiguity of written natural
language, Information Extraction is a difficult task. Ontology-based Information Extraction (OBIE) reduces this complexity by including
contextual information in the form of a domain ontology. The ontology provides guidance to the extraction process by providing concepts
and relationships about the domain. However, OBIE systems have not been widely adopted because of the difficulties in deployment
and maintenance. The Ontology-based Components for Information Extraction (OBCIE) architecture has been proposed as a
form to encourage the adoption of OBIE by promoting reusability through modularity. In this paper, we propose two orthogonal extensions
to OBCIE that allow the construction of hybrid OBIE systems with higher extraction accuracy and a new functionality. The first
extension utilizes OBCIE modularity to integrate different types of implementation into one extraction system, producing a more accurate
extraction. For each concept or relationship in the ontology, we can select the best implementation for extraction, or we can
combine both implementations under an ensemble learning schema. The second extension is a novel ontology-based error detection
mechanism. Following a heuristic approach, we can identify sentences that are logically inconsistent with the domain ontology. Because
the implementation strategy for the extraction of a concept is independent of the functionality of the extraction, we can design a hybrid
OBIE system with concepts utilizing different implementation strategies for extracting correct or incorrect sentences. Our evaluation
shows that, in the implementation extension, our proposed method is more accurate in terms of correctness and completeness of the
extraction. Moreover, our error detection method can identify incorrect statements with a high accuracy.
Citation:
Gutierrez, F., Dou, D., Fickas, S., Wimalasuriya, D., & Zong, H. (2016). A hybrid ontology-based information extraction system. Journal of Information Science, 42(6), 798–820. https://doi.org/10.1177/0165551515610989