AraStance: A Multi-Country and Multi-Domain Dataset of Arabic Stance
Detection for Fact Checking
- URL: http://arxiv.org/abs/2104.13559v1
- Date: Wed, 28 Apr 2021 03:38:24 GMT
- Title: AraStance: A Multi-Country and Multi-Domain Dataset of Arabic Stance
Detection for Fact Checking
- Authors: Tariq Alhindi, Amal Alabdulkarim, Ali Alshehri, Muhammad Abdul-Mageed
and Preslav Nakov
- Abstract summary: We present our new Arabic Stance Detection dataset (AraStance) of 910 claims from a diverse set of sources.
AraStance covers false and true claims from multiple domains (e.g., politics, sports, health) and several Arab countries.
Our best model achieves an accuracy of 85% and a macro F1 score of 78%, which leaves room for improvement.
- Score: 19.962693437515753
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the continuing spread of misinformation and disinformation online, it is
of increasing importance to develop combating mechanisms at scale in the form
of automated systems that support multiple languages. One task of interest is
claim veracity prediction, which can be addressed using stance detection with
respect to relevant documents retrieved online. To this end, we present our new
Arabic Stance Detection dataset (AraStance) of 910 claims from a diverse set of
sources comprising three fact-checking websites and one news website. AraStance
covers false and true claims from multiple domains (e.g., politics, sports,
health) and several Arab countries, and it is wellbalanced between related and
unrelated documents with respect to the claims. We benchmark AraStance, along
with two other stance detection datasets, using a number of BERTbased models.
Our best model achieves an accuracy of 85% and a macro F1 score of 78%, which
leaves room for improvement and reflects the challenging nature of AraStance
and the task of stance detection in general.
Related papers
- A Challenge Dataset and Effective Models for Conversational Stance Detection [26.208989232347058]
We introduce a new multi-turn conversation stance detection dataset (called textbfMT-CSD)
We propose a global-local attention network (textbfGLAN) to address both long and short-range dependencies inherent in conversational data.
Our dataset serves as a valuable resource to catalyze advancements in cross-domain stance detection.
arXiv Detail & Related papers (2024-03-17T08:51:01Z) - Claim Detection for Automated Fact-checking: A Survey on Monolingual, Multilingual and Cross-Lingual Research [7.242609314791262]
We present state-of-the-art multilingual claim detection research categorized into three key factors of the problem, verifiability, priority, and similarity.
We present a detailed overview of the existing multilingual datasets along with the challenges and suggest possible future advancements.
arXiv Detail & Related papers (2024-01-22T14:17:03Z) - Automated stance detection in complex topics and small languages: the
challenging case of immigration in polarizing news media [0.0]
This paper explores the applicability of large language models for automated stance detection in a challenging scenario.
It involves a morphologically complex, lower-resource language, and a socio-culturally complex topic, immigration.
If the approach works in this case, it can be expected to perform as well or better in less demanding scenarios.
arXiv Detail & Related papers (2023-05-22T13:56:35Z) - On the Possibilities of AI-Generated Text Detection [76.55825911221434]
We argue that as machine-generated text approximates human-like quality, the sample size needed for detection bounds increases.
We test various state-of-the-art text generators, including GPT-2, GPT-3.5-Turbo, Llama, Llama-2-13B-Chat-HF, and Llama-2-70B-Chat-HF, against detectors, including oBERTa-Large/Base-Detector, GPTZero.
arXiv Detail & Related papers (2023-04-10T17:47:39Z) - Utilizing Background Knowledge for Robust Reasoning over Traffic
Situations [63.45021731775964]
We focus on a complementary research aspect of Intelligent Transportation: traffic understanding.
We scope our study to text-based methods and datasets given the abundant commonsense knowledge.
We adopt three knowledge-driven approaches for zero-shot QA over traffic situations.
arXiv Detail & Related papers (2022-12-04T09:17:24Z) - CONCRETE: Improving Cross-lingual Fact-checking with Cross-lingual
Retrieval [73.48591773882052]
Most fact-checking approaches focus on English only due to the data scarcity issue in other languages.
We present the first fact-checking framework augmented with crosslingual retrieval.
We train the retriever with our proposed Crosslingual Inverse Cloze Task (XICT)
arXiv Detail & Related papers (2022-09-05T17:36:14Z) - CHEF: A Pilot Chinese Dataset for Evidence-Based Fact-Checking [55.75590135151682]
CHEF is the first CHinese Evidence-based Fact-checking dataset of 10K real-world claims.
The dataset covers multiple domains, ranging from politics to public health, and provides annotated evidence retrieved from the Internet.
arXiv Detail & Related papers (2022-06-06T09:11:03Z) - Matching Tweets With Applicable Fact-Checks Across Languages [27.762055254009017]
We focus on automatically finding existing fact-checks for claims made in social media posts (tweets)
We conduct both classification and retrieval experiments, in monolingual (English only), multilingual (Spanish, Portuguese), and cross-lingual (Hindi-English) settings.
We present promising results for "match" classification (93% average accuracy) in four language pairs.
arXiv Detail & Related papers (2022-02-14T23:33:02Z) - AES Systems Are Both Overstable And Oversensitive: Explaining Why And
Proposing Defenses [66.49753193098356]
We investigate the reason behind the surprising adversarial brittleness of scoring models.
Our results indicate that autoscoring models, despite getting trained as "end-to-end" models, behave like bag-of-words models.
We propose detection-based protection models that can detect oversensitivity and overstability causing samples with high accuracies.
arXiv Detail & Related papers (2021-09-24T03:49:38Z) - Few-Shot Cross-Lingual Stance Detection with Sentiment-Based
Pre-Training [32.800766653254634]
We present the most comprehensive study of cross-lingual stance detection to date.
We use 15 diverse datasets in 12 languages from 6 language families.
For our experiments, we build on pattern-exploiting training, proposing the addition of a novel label encoder.
arXiv Detail & Related papers (2021-09-13T15:20:06Z) - Stance Detection Benchmark: How Robust Is Your Stance Detection? [65.91772010586605]
Stance Detection (StD) aims to detect an author's stance towards a certain topic or claim.
We introduce a StD benchmark that learns from ten StD datasets of various domains in a multi-dataset learning setting.
Within this benchmark setup, we are able to present new state-of-the-art results on five of the datasets.
arXiv Detail & Related papers (2020-01-06T13:37:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.