Video Games as a Corpus: Sentiment Analysis using Fallout New Vegas
Dialog
- URL: http://arxiv.org/abs/2212.02168v1
- Date: Mon, 5 Dec 2022 11:09:05 GMT
- Title: Video Games as a Corpus: Sentiment Analysis using Fallout New Vegas
Dialog
- Authors: Mika H\"am\"al\"ainen and Khalid Alnajjar and Thierry Poibeau
- Abstract summary: We present a method for extracting a multilingual sentiment annotated dialog data set from Fallout New Vegas.
The game has been translated into English, Spanish, German, French and Italian.
We conduct experiments on multilingual, multilabel sentiment analysis on the extracted data set.
- Score: 1.9014535120129343
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a method for extracting a multilingual sentiment annotated dialog
data set from Fallout New Vegas. The game developers have preannotated every
line of dialog in the game in one of the 8 different sentiments: \textit{anger,
disgust, fear, happy, neutral, pained, sad } and \textit{surprised}. The game
has been translated into English, Spanish, German, French and Italian. We
conduct experiments on multilingual, multilabel sentiment analysis on the
extracted data set using multilingual BERT, XLMRoBERTa and language specific
BERT models. In our experiments, multilingual BERT outperformed XLMRoBERTa for
most of the languages, also language specific models were slightly better than
multilingual BERT for most of the languages. The best overall accuracy was 54\%
and it was achieved by using multilingual BERT on Spanish data. The extracted
data set presents a challenging task for sentiment analysis. We have released
the data, including the testing and training splits, openly on Zenodo. The data
set has been shuffled for copyright reasons.
Related papers
- MultiSpider: Towards Benchmarking Multilingual Text-to-SQL Semantic
Parsing [48.216386761482525]
We present MultiSpider, the largest multilingual text-to- schema- dataset which covers seven languages (English, German, French, Spanish, Japanese, Chinese, and Vietnamese)
Experimental results under three typical settings (zero-shot, monolingual and multilingual) reveal a 6.1% absolute drop in accuracy in non-English languages.
We also propose a simple framework augmentation framework SAVe (Augmentation-with-Verification) which boosts the overall performance by about 1.8% and closes the 29.5% performance gap across languages.
arXiv Detail & Related papers (2022-12-27T13:58:30Z) - Multilingual Persuasion Detection: Video Games as an Invaluable Data
Source for NLP [0.6123324869194194]
We show the viability of this data in building a persuasion detection system using a natural language processing model called BERT.
We believe that video games have a lot of unused potential as a datasource for a variety of NLP tasks.
arXiv Detail & Related papers (2022-07-10T12:38:02Z) - What makes multilingual BERT multilingual? [60.9051207862378]
In this work, we provide an in-depth experimental study to supplement the existing literature of cross-lingual ability.
We compare the cross-lingual ability of non-contextualized and contextualized representation model with the same data.
We found that datasize and context window size are crucial factors to the transferability.
arXiv Detail & Related papers (2020-10-20T05:41:56Z) - Looking for Clues of Language in Multilingual BERT to Improve
Cross-lingual Generalization [56.87201892585477]
Token embeddings in multilingual BERT (m-BERT) contain both language and semantic information.
We control the output languages of multilingual BERT by manipulating the token embeddings.
arXiv Detail & Related papers (2020-10-20T05:41:35Z) - It's not Greek to mBERT: Inducing Word-Level Translations from
Multilingual BERT [54.84185432755821]
multilingual BERT (mBERT) learns rich cross-lingual representations, that allow for transfer across languages.
We study the word-level translation information embedded in mBERT and present two simple methods that expose remarkable translation capabilities with no fine-tuning.
arXiv Detail & Related papers (2020-10-16T09:49:32Z) - CoSDA-ML: Multi-Lingual Code-Switching Data Augmentation for Zero-Shot
Cross-Lingual NLP [68.2650714613869]
We propose a data augmentation framework to generate multi-lingual code-switching data to fine-tune mBERT.
Compared with the existing work, our method does not rely on bilingual sentences for training, and requires only one training process for multiple target languages.
arXiv Detail & Related papers (2020-06-11T13:15:59Z) - Identifying Necessary Elements for BERT's Multilinguality [4.822598110892846]
multilingual BERT (mBERT) yields high quality multilingual representations and enables effective zero-shot transfer.
We aim to identify architectural properties of BERT and linguistic properties of languages that are necessary for BERT to become multilingual.
arXiv Detail & Related papers (2020-05-01T14:27:14Z) - The Gutenberg Dialogue Dataset [1.90365714903665]
Current publicly available open-domain dialogue datasets offer a trade-off between quality and size.
We build a high-quality dataset of 14.8M utterances in English, and smaller datasets in German, Dutch, Spanish, Portuguese, Italian, and Hungarian.
arXiv Detail & Related papers (2020-04-27T12:52:20Z) - A Study of Cross-Lingual Ability and Language-specific Information in
Multilingual BERT [60.9051207862378]
multilingual BERT works remarkably well on cross-lingual transfer tasks.
Datasize and context window size are crucial factors to the transferability.
There is a computationally cheap but effective approach to improve the cross-lingual ability of multilingual BERT.
arXiv Detail & Related papers (2020-04-20T11:13:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.