Data sets and benchmark for interactive collaborative storytelling text generation [abstract]

dc.contributor.authorDe Silva, N
dc.date.accessioned2025-07-23T06:42:19Z
dc.date.issued2022
dc.descriptionThe following papers were published based on the results of this research project. [1] A. Peiris and N. de Silva, SHADE: Semantic Hypernym Annotator for Domain-Specific Entities-Dungeons and Dragons Domain Use Case, in 2023 IEEE 17th International Conference on Industrial and Information Systems (ICIIS), IEEE, 2023, pp. 1–6. doi: 10.1109/ICIIS58898.2023.10253606 https://ieeexplore.ieee.org/document/10253606
dc.description.abstractTabletop Roleplaying Games (TTRPGs) are a billion-dollar industry dominated by Dungeons and Dragons (D&D) since its inception in 1974. Therefore, many attempts have been made from both academic and industry perspective to automate certain aspects of the game. This ranges from dynamic map generation work to research on discourse analysis. But one aspect of the TTRPGs has so far been hard to automate. That is the task of adventure generation. It has long been considered fruit of creativity and human touch which a computer cannot replicate. However, with the advent of deep learning techniques, these long held beliefs can be called to question. Natural language text generation has become not only possible, but almost indistinguishable from human output. The objective of the project supported by this short-term grant is to create a data set in the D&D domain on which the deep learning models can train. As such, this short-term grant provided the stipend of two Technical Assistants for the duration of 6 months to collect, clean, process, annotate, and curate the required data. The initial raw data was collected from publicly available sources using web crawling. Next, a crowdsourcing interface was built for the purpose of data annotation. The two technical assistants with the necessary domain knowledge then used the said interface to annotate thousands of data points. The data was initially collected to an SQL database and are currently been used as input for the research of three part-time MSc students. As the data is processed and derivations are created in these projects, the derivative data and models are uploaded to international public repositories. One interesting observation of the data collection process is the near even split of the usage of the three from of annotation provided; link list, noun phrase list, and manual text input. Why this is an interesting observation is because the choice between these methods were not presented independently. They were given in the presented order where latter options only existing to capture those not covered by the former options. Therefore, this observation gives us two conclusions: 1) the annotators that were employed have done a comprehensive job because it would have been easier to use the bias on the first drop down option and exert less effort which does not seem to be the case, 2) the inability of the first list (and even the second) to adequately cover the annotation justifies our use of human labour as opposed to automatic annotation as this shows the raw data lacked information to model an automatic annotation system
dc.description.sponsorshipSenate Research Committee
dc.identifier.accnoSRC207
dc.identifier.srgnoSRC/ST/2022/01
dc.identifier.urihttps://dl.lib.uom.lk/handle/123/23918
dc.language.isoen
dc.subjectSENATE RESEARCH COMMITTEE – Research Report
dc.subjectDEEP LEARNING
dc.subjectDATA SETS
dc.subjectSTORYTELLING
dc.subjectTEXT GENERATION
dc.titleData sets and benchmark for interactive collaborative storytelling text generation [abstract]
dc.typeSRC-Report

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
SRC207 - Nisansa de Silva SRCST202201.pdf
Size:
971.82 KB
Format:
Adobe Portable Document Format
Description:
SRC Report

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: