X-Stance: A Multilingual Multi-Target Dataset for Stance Detection
- URL: http://arxiv.org/abs/2003.08385v2
- Date: Wed, 10 Jun 2020 15:05:11 GMT
- Title: X-Stance: A Multilingual Multi-Target Dataset for Stance Detection
- Authors: Jannis Vamvas and Rico Sennrich
- Abstract summary: We extract a large-scale stance detection dataset from comments written by candidates of elections in Switzerland.
The dataset consists of German, French and Italian text, allowing for a cross-lingual evaluation of stance detection.
- Score: 42.46681912294797
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We extract a large-scale stance detection dataset from comments written by
candidates of elections in Switzerland. The dataset consists of German, French
and Italian text, allowing for a cross-lingual evaluation of stance detection.
It contains 67 000 comments on more than 150 political issues (targets). Unlike
stance detection models that have specific target issues, we use the dataset to
train a single model on all the issues. To make learning across targets
possible, we prepend to each instance a natural question that represents the
target (e.g. "Do you support X?"). Baseline results from multilingual BERT show
that zero-shot cross-lingual and cross-target transfer of stance detection is
moderately successful with this approach.
Related papers
- Zero-shot Cross-lingual Transfer Learning with Multiple Source and Target Languages for Information Extraction: Language Selection and Adversarial Training [38.19963761398705]
This paper provides a detailed analysis on Cross-Lingual Multi-Transferability (many-to-many transfer learning) for the recent IE corpora.
We first determine the correlation between single-language performance and a wide range of linguistic-based distances.
Next, we investigate the more general zero-shot multi-lingual transfer settings where multiple languages are involved in the training and evaluation processes.
arXiv Detail & Related papers (2024-11-13T17:13:25Z) - DeMuX: Data-efficient Multilingual Learning [57.37123046817781]
DEMUX is a framework that prescribes exact data-points to label from vast amounts of unlabelled multilingual data.
Our end-to-end framework is language-agnostic, accounts for model representations, and supports multilingual target configurations.
arXiv Detail & Related papers (2023-11-10T20:09:08Z) - How to Solve Few-Shot Abusive Content Detection Using the Data We Actually Have [58.23138483086277]
In this work we leverage datasets we already have, covering a wide range of tasks related to abusive language detection.
Our goal is to build models cheaply for a new target label set and/or language, using only a few training examples of the target domain.
Our experiments show that using already existing datasets and only a few-shots of the target task the performance of models improve both monolingually and across languages.
arXiv Detail & Related papers (2023-05-23T14:04:12Z) - Ensemble Transfer Learning for Multilingual Coreference Resolution [60.409789753164944]
A problem that frequently occurs when working with a non-English language is the scarcity of annotated training data.
We design a simple but effective ensemble-based framework that combines various transfer learning techniques.
We also propose a low-cost TL method that bootstraps coreference resolution models by utilizing Wikipedia anchor texts.
arXiv Detail & Related papers (2023-01-22T18:22:55Z) - IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and
Languages [87.5457337866383]
We introduce the Image-Grounded Language Understanding Evaluation benchmark.
IGLUE brings together visual question answering, cross-modal retrieval, grounded reasoning, and grounded entailment tasks across 20 diverse languages.
We find that translate-test transfer is superior to zero-shot transfer and that few-shot learning is hard to harness for many tasks.
arXiv Detail & Related papers (2022-01-27T18:53:22Z) - CrossSum: Beyond English-Centric Cross-Lingual Summarization for 1,500+
Language Pairs [27.574815708395203]
CrossSum is a large-scale cross-lingual summarization dataset comprising 1.68 million article-summary samples in 1,500+ language pairs.
We create CrossSum by aligning parallel articles written in different languages via cross-lingual retrieval from a multilingual abstractive summarization dataset.
We propose a multistage data sampling algorithm to effectively train a cross-lingual summarization model capable of summarizing an article in any target language.
arXiv Detail & Related papers (2021-12-16T11:40:36Z) - Towards Zero-shot Cross-lingual Image Retrieval and Tagging [1.4425878137951236]
We present a zero-shot approach for learning multi-modal representations using cross-lingual pre-training on the text side.
We introduce a new 1K multi-lingual MSCOCO2014 caption test dataset (XTD10) in 7 languages that we collected using a crowdsourcing platform.
We also demonstrate how a cross-lingual model can be used for downstream tasks like multi-lingual image tagging in a zero shot manner.
arXiv Detail & Related papers (2021-09-15T23:39:15Z) - xGQA: Cross-Lingual Visual Question Answering [100.35229218735938]
xGQA is a new multilingual evaluation benchmark for the visual question answering task.
We extend the established English GQA dataset to 7 typologically diverse languages.
We propose new adapter-based approaches to adapt multimodal transformer-based models to become multilingual.
arXiv Detail & Related papers (2021-09-13T15:58:21Z) - Few-Shot Cross-Lingual Stance Detection with Sentiment-Based
Pre-Training [32.800766653254634]
We present the most comprehensive study of cross-lingual stance detection to date.
We use 15 diverse datasets in 12 languages from 6 language families.
For our experiments, we build on pattern-exploiting training, proposing the addition of a novel label encoder.
arXiv Detail & Related papers (2021-09-13T15:20:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.