Multilingual Persuasion Detection: Video Games as an Invaluable Data
Source for NLP
- URL: http://arxiv.org/abs/2207.04453v1
- Date: Sun, 10 Jul 2022 12:38:02 GMT
- Title: Multilingual Persuasion Detection: Video Games as an Invaluable Data
Source for NLP
- Authors: Teemu P\"oyh\"onen, Mika H\"am\"al\"ainen, Khalid Alnajjar
- Abstract summary: We show the viability of this data in building a persuasion detection system using a natural language processing model called BERT.
We believe that video games have a lot of unused potential as a datasource for a variety of NLP tasks.
- Score: 0.6123324869194194
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Role-playing games (RPGs) have a considerable amount of text in video game
dialogues. Quite often this text is semi-annotated by the game developers. In
this paper, we extract a multilingual dataset of persuasive dialogue from
several RPGs. We show the viability of this data in building a persuasion
detection system using a natural language processing (NLP) model called BERT.
We believe that video games have a lot of unused potential as a datasource for
a variety of NLP tasks. The code and data described in this paper are available
on Zenodo.
Related papers
- GENEVA: GENErating and Visualizing branching narratives using LLMs [15.43734266732214]
textbfGENEVA, a prototype tool, generates a rich narrative graph with branching and reconverging storylines.
textbfGENEVA has the potential to assist in game development, simulations, and other applications with game-like properties.
arXiv Detail & Related papers (2023-11-15T18:55:45Z) - Deepfake audio as a data augmentation technique for training automatic
speech to text transcription models [55.2480439325792]
We propose a framework that approaches data augmentation based on deepfake audio.
A dataset produced by Indians (in English) was selected, ensuring the presence of a single accent.
arXiv Detail & Related papers (2023-09-22T11:33:03Z) - NusaWrites: Constructing High-Quality Corpora for Underrepresented and
Extremely Low-Resource Languages [54.808217147579036]
We conduct a case study on Indonesian local languages.
We compare the effectiveness of online scraping, human translation, and paragraph writing by native speakers in constructing datasets.
Our findings demonstrate that datasets generated through paragraph writing by native speakers exhibit superior quality in terms of lexical diversity and cultural content.
arXiv Detail & Related papers (2023-09-19T14:42:33Z) - FIREBALL: A Dataset of Dungeons and Dragons Actual-Play with Structured
Game State Information [75.201485544517]
We present FIREBALL, a large dataset containing nearly 25,000 unique sessions from real D&D gameplay on Discord with true game state info.
We demonstrate that FIREBALL can improve natural language generation (NLG) by using Avrae state information.
arXiv Detail & Related papers (2023-05-02T15:36:10Z) - Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with
Unsupervised Text Pretraining [65.30528567491984]
This paper proposes a method for zero-shot multilingual TTS using text-only data for the target language.
The use of text-only data allows the development of TTS systems for low-resource languages.
Evaluation results demonstrate highly intelligible zero-shot TTS with a character error rate of less than 12% for an unseen language.
arXiv Detail & Related papers (2023-01-30T00:53:50Z) - Video Games as a Corpus: Sentiment Analysis using Fallout New Vegas
Dialog [1.9014535120129343]
We present a method for extracting a multilingual sentiment annotated dialog data set from Fallout New Vegas.
The game has been translated into English, Spanish, German, French and Italian.
We conduct experiments on multilingual, multilabel sentiment analysis on the extracted data set.
arXiv Detail & Related papers (2022-12-05T11:09:05Z) - A Snapshot into the Possibility of Video Game Machine Translation [0.0]
This article introduces some of the challenges of video game translation, some of the existing literature, as well as the systems and data sets used in this experiment.
One such finding highlights the model's ability to learn typical rules and patterns of video game translations from English into French.
arXiv Detail & Related papers (2022-09-19T08:16:59Z) - Expanding Pretrained Models to Thousands More Languages via
Lexicon-based Adaptation [133.7313847857935]
Our study highlights how NLP methods can be adapted to thousands more languages that are under-served by current technology.
For 19 under-represented languages across 3 tasks, our methods lead to consistent improvements of up to 5 and 15 points with and without extra monolingual text respectively.
arXiv Detail & Related papers (2022-03-17T16:48:22Z) - Cross-Lingual Dialogue Dataset Creation via Outline-Based Generation [70.81596088969378]
Cross-lingual Outline-based Dialogue dataset (termed COD) enables natural language understanding.
COD enables dialogue state tracking, and end-to-end dialogue modelling and evaluation in 4 diverse languages.
arXiv Detail & Related papers (2022-01-31T18:11:21Z) - Data and Representation for Turkish Natural Language Inference [6.135815931215188]
We offer a positive response for natural language inference (NLI) in Turkish.
We translate two large English NLI datasets into Turkish and had a team of experts validate their translation quality and fidelity to the original labels.
We find that in-language embeddings are essential and that morphological parsing can be avoided where the training set is large.
arXiv Detail & Related papers (2020-04-30T17:12:52Z) - The Gutenberg Dialogue Dataset [1.90365714903665]
Current publicly available open-domain dialogue datasets offer a trade-off between quality and size.
We build a high-quality dataset of 14.8M utterances in English, and smaller datasets in German, Dutch, Spanish, Portuguese, Italian, and Hungarian.
arXiv Detail & Related papers (2020-04-27T12:52:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.