Bridging the Domain Gap for Stance Detection for the Zulu language
- URL: http://arxiv.org/abs/2205.03153v1
- Date: Fri, 6 May 2022 11:44:35 GMT
- Title: Bridging the Domain Gap for Stance Detection for the Zulu language
- Authors: Gcinizwe Dlamini, Imad Eddine Ibrahim Bekkouch, Adil Khan, and Leon
Derczynski
- Abstract summary: Existing AI based approaches for fighting misinformation in literature suggest automatic stance detection as an integral first step to success.
We propose a black-box non-intrusive method that utilizes techniques from Domain Adaptation to reduce the domain gap.
This allows us to rapidly achieve similar results for stance detection for the Zulu language, the target language in this work, as are found for English.
- Score: 6.509758931804479
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Misinformation has become a major concern in recent last years given its
spread across our information sources. In the past years, many NLP tasks have
been introduced in this area, with some systems reaching good results on
English language datasets. Existing AI based approaches for fighting
misinformation in literature suggest automatic stance detection as an integral
first step to success. Our paper aims at utilizing this progress made for
English to transfers that knowledge into other languages, which is a
non-trivial task due to the domain gap between English and the target
languages. We propose a black-box non-intrusive method that utilizes techniques
from Domain Adaptation to reduce the domain gap, without requiring any human
expertise in the target language, by leveraging low-quality data in both a
supervised and unsupervised manner. This allows us to rapidly achieve similar
results for stance detection for the Zulu language, the target language in this
work, as are found for English. We also provide a stance detection dataset in
the Zulu language. Our experimental results show that by leveraging English
datasets and machine translation we can increase performances on both English
data along with other languages.
Related papers
- A multilingual dataset for offensive language and hate speech detection for hausa, yoruba and igbo languages [0.0]
This study addresses the challenge by developing and introducing novel datasets for offensive language detection in three major Nigerian languages: Hausa, Yoruba, and Igbo.
We collected data from Twitter and manually annotated it to create datasets for each of the three languages, using native speakers.
We used pre-trained language models to evaluate their efficacy in detecting offensive language in our datasets. The best-performing model achieved an accuracy of 90%.
arXiv Detail & Related papers (2024-06-04T09:58:29Z) - Multilingual Diversity Improves Vision-Language Representations [66.41030381363244]
Pre-training on this dataset outperforms using English-only or English-dominated datasets on ImageNet.
On a geographically diverse task like GeoDE, we also observe improvements across all regions, with the biggest gain coming from Africa.
arXiv Detail & Related papers (2024-05-27T08:08:51Z) - Zero-shot Cross-lingual Stance Detection via Adversarial Language Adaptation [7.242609314791262]
This paper introduces a novel approach to zero-shot cross-lingual stance detection, Multilingual Translation-Augmented BERT (MTAB)
Our technique employs translation augmentation to improve zero-shot performance and pairs it with adversarial learning to further boost model efficacy.
We demonstrate the effectiveness of our proposed approach, showcasing improved results in comparison to a strong baseline model as well as ablated versions of our model.
arXiv Detail & Related papers (2024-04-22T16:56:43Z) - A Persian Benchmark for Joint Intent Detection and Slot Filling [3.633817600744528]
Natural Language Understanding (NLU) is important in today's technology as it enables machines to comprehend and process human language.
This paper highlights the significance of advancing the field of NLU for low-resource languages.
We create a Persian benchmark for joint intent detection and slot filling based on the ATIS dataset.
arXiv Detail & Related papers (2023-03-01T10:57:21Z) - CONCRETE: Improving Cross-lingual Fact-checking with Cross-lingual
Retrieval [73.48591773882052]
Most fact-checking approaches focus on English only due to the data scarcity issue in other languages.
We present the first fact-checking framework augmented with crosslingual retrieval.
We train the retriever with our proposed Crosslingual Inverse Cloze Task (XICT)
arXiv Detail & Related papers (2022-09-05T17:36:14Z) - No Language Left Behind: Scaling Human-Centered Machine Translation [69.28110770760506]
We create datasets and models aimed at narrowing the performance gap between low and high-resource languages.
We propose multiple architectural and training improvements to counteract overfitting while training on thousands of tasks.
Our model achieves an improvement of 44% BLEU relative to the previous state-of-the-art.
arXiv Detail & Related papers (2022-07-11T07:33:36Z) - Por Qu\'e N\~ao Utiliser Alla Spr{\aa}k? Mixed Training with Gradient
Optimization in Few-Shot Cross-Lingual Transfer [2.7213511121305465]
We propose a one-step mixed training method that trains on both source and target data.
We use one model to handle all target languages simultaneously to avoid excessively language-specific models.
Our proposed method achieves state-of-the-art performance on all tasks and outperforms target-adapting by a large margin.
arXiv Detail & Related papers (2022-04-29T04:05:02Z) - Cross-lingual Offensive Language Identification for Low Resource
Languages: The Case of Marathi [2.4737119633827174]
MOLD is the first dataset of its kind compiled for Marathi, opening a new domain for research in low-resource Indo-Aryan languages.
We present results from several machine learning experiments on this dataset, including zero-short and other transfer learning experiments on state-of-the-art cross-lingual transformers.
arXiv Detail & Related papers (2021-09-08T11:29:44Z) - Learning Domain-Specialised Representations for Cross-Lingual Biomedical
Entity Linking [66.76141128555099]
We propose a novel cross-lingual biomedical entity linking task (XL-BEL)
We first investigate the ability of standard knowledge-agnostic as well as knowledge-enhanced monolingual and multilingual LMs beyond the standard monolingual English BEL task.
We then address the challenge of transferring domain-specific knowledge in resource-rich languages to resource-poor ones.
arXiv Detail & Related papers (2021-05-30T00:50:00Z) - Unsupervised Cross-Lingual Speech Emotion Recognition Using
DomainAdversarial Neural Network [48.1535353007371]
Cross-domain Speech Emotion Recog-nition (SER) is still a challenging taskdue to the distribution shift between source and target domains.
We propose a Domain Adversarial Neural Net-work (DANN) based approach to mitigate this distribution shiftproblem for cross-lingual SER.
arXiv Detail & Related papers (2020-12-21T08:21:11Z) - Kungfupanda at SemEval-2020 Task 12: BERT-Based Multi-Task Learning for
Offensive Language Detection [55.445023584632175]
We build an offensive language detection system, which combines multi-task learning with BERT-based models.
Our model achieves 91.51% F1 score in English Sub-task A, which is comparable to the first place.
arXiv Detail & Related papers (2020-04-28T11:27:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.