Data sets and benchmark for interactive collaborative storytelling text generation [abstract]
| dc.contributor.author | De Silva, N | |
| dc.date.accessioned | 2025-07-23T06:42:19Z | |
| dc.date.issued | 2022 | |
| dc.description | The following papers were published based on the results of this research project. [1] A. Peiris and N. de Silva, SHADE: Semantic Hypernym Annotator for Domain-Specific Entities-Dungeons and Dragons Domain Use Case, in 2023 IEEE 17th International Conference on Industrial and Information Systems (ICIIS), IEEE, 2023, pp. 1–6. doi: 10.1109/ICIIS58898.2023.10253606 https://ieeexplore.ieee.org/document/10253606 | |
| dc.description.abstract | Tabletop Roleplaying Games (TTRPGs) are a billion-dollar industry dominated by Dungeons and Dragons (D&D) since its inception in 1974. Therefore, many attempts have been made from both academic and industry perspective to automate certain aspects of the game. This ranges from dynamic map generation work to research on discourse analysis. But one aspect of the TTRPGs has so far been hard to automate. That is the task of adventure generation. It has long been considered fruit of creativity and human touch which a computer cannot replicate. However, with the advent of deep learning techniques, these long held beliefs can be called to question. Natural language text generation has become not only possible, but almost indistinguishable from human output. The objective of the project supported by this short-term grant is to create a data set in the D&D domain on which the deep learning models can train. As such, this short-term grant provided the stipend of two Technical Assistants for the duration of 6 months to collect, clean, process, annotate, and curate the required data. The initial raw data was collected from publicly available sources using web crawling. Next, a crowdsourcing interface was built for the purpose of data annotation. The two technical assistants with the necessary domain knowledge then used the said interface to annotate thousands of data points. The data was initially collected to an SQL database and are currently been used as input for the research of three part-time MSc students. As the data is processed and derivations are created in these projects, the derivative data and models are uploaded to international public repositories. One interesting observation of the data collection process is the near even split of the usage of the three from of annotation provided; link list, noun phrase list, and manual text input. Why this is an interesting observation is because the choice between these methods were not presented independently. They were given in the presented order where latter options only existing to capture those not covered by the former options. Therefore, this observation gives us two conclusions: 1) the annotators that were employed have done a comprehensive job because it would have been easier to use the bias on the first drop down option and exert less effort which does not seem to be the case, 2) the inability of the first list (and even the second) to adequately cover the annotation justifies our use of human labour as opposed to automatic annotation as this shows the raw data lacked information to model an automatic annotation system | |
| dc.description.sponsorship | Senate Research Committee | |
| dc.identifier.accno | SRC207 | |
| dc.identifier.srgno | SRC/ST/2022/01 | |
| dc.identifier.uri | https://dl.lib.uom.lk/handle/123/23918 | |
| dc.language.iso | en | |
| dc.subject | SENATE RESEARCH COMMITTEE – Research Report | |
| dc.subject | DEEP LEARNING | |
| dc.subject | DATA SETS | |
| dc.subject | STORYTELLING | |
| dc.subject | TEXT GENERATION | |
| dc.title | Data sets and benchmark for interactive collaborative storytelling text generation [abstract] | |
| dc.type | SRC-Report |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- SRC207 - Nisansa de Silva SRCST202201.pdf
- Size:
- 971.82 KB
- Format:
- Adobe Portable Document Format
- Description:
- SRC Report
License bundle
1 - 1 of 1
Loading...
- Name:
- license.txt
- Size:
- 1.71 KB
- Format:
- Item-specific license agreed upon to submission
- Description:
