Fine tuning named entity extraction models for the fantasy domain
Loading...
Date
2023-12-09
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
IEEE
Abstract
Named Entity Recognition (NER) is a sequence
classification Natural Language Processing task where entities
are identified in the text and classified into predefined categories.
It acts as a foundation for most information extraction systems.
Dungeons and Dragons (D&D) is an open-ended tabletop fantasy
game with its own diverse lore. DnD entities are domain-specific
and are thus unrecognizable by even the state-of-the-art offthe-
shelf NER systems as the NER systems are trained on
general data for pre-defined categories such as: person (PERS),
location (LOC), organization (ORG), and miscellaneous (MISC).
For meaningful extraction of information from fantasy text, the
entities need to be classified into domain-specific entity categories
as well as the models be fine-tuned on a domain-relevant corpus.
This work uses available lore of monsters in the D&Ddomain to
fine-tune Trankit, which is a prolific NER framework that uses
a pre-trained model for NER. Upon this training, the system
acquires the ability to extract monster names from relevant
domain documents under a novel NER tag. This work compares
the accuracy of the monster name identification against; the
zero-shot Trankit model and two FLAIR models. The fine-tuned
Trankit model achieves an 87.86% F1 score surpassing all the
other considered models.
Description
Keywords
Citation
A. Sivaganeshan and N. De Silva, "Fine Tuning Named Entity Extraction Models for the Fantasy Domain," 2023 Moratuwa Engineering Research Conference (MERCon), Moratuwa, Sri Lanka, 2023, pp. 346-351, doi: 10.1109/MERCon60487.2023.10355501.