Related papers: Video Games as a Corpus: Sentiment Analysis using Fallout New Vegas Dialog

Video Games as a Corpus: Sentiment Analysis using Fallout New Vegas Dialog

URL: http://arxiv.org/abs/2212.02168v1
Date: Mon, 5 Dec 2022 11:09:05 GMT
Title: Video Games as a Corpus: Sentiment Analysis using Fallout New Vegas Dialog
Authors: Mika H\"am\"al\"ainen and Khalid Alnajjar and Thierry Poibeau
Abstract summary: We present a method for extracting a multilingual sentiment annotated dialog data set from Fallout New Vegas. The game has been translated into English, Spanish, German, French and Italian. We conduct experiments on multilingual, multilabel sentiment analysis on the extracted data set.
Score: 1.9014535120129343
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present a method for extracting a multilingual sentiment annotated dialog data set from Fallout New Vegas. The game developers have preannotated every line of dialog in the game in one of the 8 different sentiments: \textit{anger, disgust, fear, happy, neutral, pained, sad } and \textit{surprised}. The game has been translated into English, Spanish, German, French and Italian. We conduct experiments on multilingual, multilabel sentiment analysis on the extracted data set using multilingual BERT, XLMRoBERTa and language specific BERT models. In our experiments, multilingual BERT outperformed XLMRoBERTa for most of the languages, also language specific models were slightly better than multilingual BERT for most of the languages. The best overall accuracy was 54\% and it was achieved by using multilingual BERT on Spanish data. The extracted data set presents a challenging task for sentiment analysis. We have released the data, including the testing and training splits, openly on Zenodo. The data set has been shuffled for copyright reasons.

Related papers

On Importance of Code-Mixed Embeddings for Hate Speech Identification [0.4194295877935868]
We analyze the significance of code-mixed embeddings and evaluate the performance of BERT and HingBERT models in hate speech detection. Our study demonstrates that HingBERT models, benefiting from training on the extensive Hindi-English dataset L3-HingCorpus, outperform BERT models when tested on hate speech text datasets.
arXiv Detail & Related papers (2024-11-27T18:23:57Z)
MultiSpider: Towards Benchmarking Multilingual Text-to-SQL Semantic Parsing [48.216386761482525]
We present MultiSpider, the largest multilingual text-to- schema- dataset which covers seven languages (English, German, French, Spanish, Japanese, Chinese, and Vietnamese) Experimental results under three typical settings (zero-shot, monolingual and multilingual) reveal a 6.1% absolute drop in accuracy in non-English languages. We also propose a simple framework augmentation framework SAVe (Augmentation-with-Verification) which boosts the overall performance by about 1.8% and closes the 29.5% performance gap across languages.
arXiv Detail & Related papers (2022-12-27T13:58:30Z)
Multilingual Persuasion Detection: Video Games as an Invaluable Data Source for NLP [0.6123324869194194]
We show the viability of this data in building a persuasion detection system using a natural language processing model called BERT. We believe that video games have a lot of unused potential as a datasource for a variety of NLP tasks.
arXiv Detail & Related papers (2022-07-10T12:38:02Z)
What makes multilingual BERT multilingual? [60.9051207862378]
In this work, we provide an in-depth experimental study to supplement the existing literature of cross-lingual ability. We compare the cross-lingual ability of non-contextualized and contextualized representation model with the same data. We found that datasize and context window size are crucial factors to the transferability.
arXiv Detail & Related papers (2020-10-20T05:41:56Z)
Looking for Clues of Language in Multilingual BERT to Improve Cross-lingual Generalization [56.87201892585477]
Token embeddings in multilingual BERT (m-BERT) contain both language and semantic information. We control the output languages of multilingual BERT by manipulating the token embeddings.
arXiv Detail & Related papers (2020-10-20T05:41:35Z)
It's not Greek to mBERT: Inducing Word-Level Translations from Multilingual BERT [54.84185432755821]
multilingual BERT (mBERT) learns rich cross-lingual representations, that allow for transfer across languages. We study the word-level translation information embedded in mBERT and present two simple methods that expose remarkable translation capabilities with no fine-tuning.
arXiv Detail & Related papers (2020-10-16T09:49:32Z)
CoSDA-ML: Multi-Lingual Code-Switching Data Augmentation for Zero-Shot Cross-Lingual NLP [68.2650714613869]
We propose a data augmentation framework to generate multi-lingual code-switching data to fine-tune mBERT. Compared with the existing work, our method does not rely on bilingual sentences for training, and requires only one training process for multiple target languages.
arXiv Detail & Related papers (2020-06-11T13:15:59Z)
Identifying Necessary Elements for BERT's Multilinguality [4.822598110892846]
multilingual BERT (mBERT) yields high quality multilingual representations and enables effective zero-shot transfer. We aim to identify architectural properties of BERT and linguistic properties of languages that are necessary for BERT to become multilingual.
arXiv Detail & Related papers (2020-05-01T14:27:14Z)
The Gutenberg Dialogue Dataset [1.90365714903665]
Current publicly available open-domain dialogue datasets offer a trade-off between quality and size. We build a high-quality dataset of 14.8M utterances in English, and smaller datasets in German, Dutch, Spanish, Portuguese, Italian, and Hungarian.
arXiv Detail & Related papers (2020-04-27T12:52:20Z)
A Study of Cross-Lingual Ability and Language-specific Information in Multilingual BERT [60.9051207862378]
multilingual BERT works remarkably well on cross-lingual transfer tasks. Datasize and context window size are crucial factors to the transferability. There is a computationally cheap but effective approach to improve the cross-lingual ability of multilingual BERT.
arXiv Detail & Related papers (2020-04-20T11:13:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.