AraStance: A Multi-Country and Multi-Domain Dataset of Arabic Stance
Detection for Fact Checking
- URL: http://arxiv.org/abs/2104.13559v1
- Date: Wed, 28 Apr 2021 03:38:24 GMT
- Title: AraStance: A Multi-Country and Multi-Domain Dataset of Arabic Stance
Detection for Fact Checking
- Authors: Tariq Alhindi, Amal Alabdulkarim, Ali Alshehri, Muhammad Abdul-Mageed
and Preslav Nakov
- Abstract summary: We present our new Arabic Stance Detection dataset (AraStance) of 910 claims from a diverse set of sources.
AraStance covers false and true claims from multiple domains (e.g., politics, sports, health) and several Arab countries.
Our best model achieves an accuracy of 85% and a macro F1 score of 78%, which leaves room for improvement.
- Score: 19.962693437515753
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the continuing spread of misinformation and disinformation online, it is
of increasing importance to develop combating mechanisms at scale in the form
of automated systems that support multiple languages. One task of interest is
claim veracity prediction, which can be addressed using stance detection with
respect to relevant documents retrieved online. To this end, we present our new
Arabic Stance Detection dataset (AraStance) of 910 claims from a diverse set of
sources comprising three fact-checking websites and one news website. AraStance
covers false and true claims from multiple domains (e.g., politics, sports,
health) and several Arab countries, and it is wellbalanced between related and
unrelated documents with respect to the claims. We benchmark AraStance, along
with two other stance detection datasets, using a number of BERTbased models.
Our best model achieves an accuracy of 85% and a macro F1 score of 78%, which
leaves room for improvement and reflects the challenging nature of AraStance
and the task of stance detection in general.
Related papers
- IAI Group at CheckThat! 2024: Transformer Models and Data Augmentation for Checkworthy Claim Detection [1.3686993145787067]
This paper describes IAI group's participation for automated check-worthiness estimation for claims.
The task involves the automated detection of check-worthy claims in English, Dutch, and Arabic political debates and Twitter data.
We utilize various pre-trained generative decoder and encoder transformer models, employing methods such as few-shot chain-of-thought reasoning.
arXiv Detail & Related papers (2024-08-02T08:59:09Z) - Claim Detection for Automated Fact-checking: A Survey on Monolingual, Multilingual and Cross-Lingual Research [7.242609314791262]
We present state-of-the-art multilingual claim detection research categorized into three key factors of the problem, verifiability, priority, and similarity.
We present a detailed overview of the existing multilingual datasets along with the challenges and suggest possible future advancements.
arXiv Detail & Related papers (2024-01-22T14:17:03Z) - Breaking Language Barriers with MMTweets: Advancing Cross-Lingual Debunked Narrative Retrieval for Fact-Checking [5.880794128275313]
Cross-lingual debunked narrative retrieval is an understudied problem.
This study introduces cross-lingual debunked narrative retrieval and addresses this research gap by: (i) creating Multilingual Misinformation Tweets (MMTweets)
MMTweets features cross-lingual pairs, images, human annotations, and fine-grained labels, making it a comprehensive resource compared to its counterparts.
We find that MMTweets presents challenges for cross-lingual debunked narrative retrieval, highlighting areas for improvement in retrieval models.
arXiv Detail & Related papers (2023-08-10T16:33:17Z) - Automated stance detection in complex topics and small languages: the
challenging case of immigration in polarizing news media [0.0]
This paper explores the applicability of large language models for automated stance detection in a challenging scenario.
It involves a morphologically complex, lower-resource language, and a socio-culturally complex topic, immigration.
If the approach works in this case, it can be expected to perform as well or better in less demanding scenarios.
arXiv Detail & Related papers (2023-05-22T13:56:35Z) - On the Possibilities of AI-Generated Text Detection [76.55825911221434]
We argue that as machine-generated text approximates human-like quality, the sample size needed for detection bounds increases.
We test various state-of-the-art text generators, including GPT-2, GPT-3.5-Turbo, Llama, Llama-2-13B-Chat-HF, and Llama-2-70B-Chat-HF, against detectors, including oBERTa-Large/Base-Detector, GPTZero.
arXiv Detail & Related papers (2023-04-10T17:47:39Z) - Utilizing Background Knowledge for Robust Reasoning over Traffic
Situations [63.45021731775964]
We focus on a complementary research aspect of Intelligent Transportation: traffic understanding.
We scope our study to text-based methods and datasets given the abundant commonsense knowledge.
We adopt three knowledge-driven approaches for zero-shot QA over traffic situations.
arXiv Detail & Related papers (2022-12-04T09:17:24Z) - CONCRETE: Improving Cross-lingual Fact-checking with Cross-lingual
Retrieval [73.48591773882052]
Most fact-checking approaches focus on English only due to the data scarcity issue in other languages.
We present the first fact-checking framework augmented with crosslingual retrieval.
We train the retriever with our proposed Crosslingual Inverse Cloze Task (XICT)
arXiv Detail & Related papers (2022-09-05T17:36:14Z) - CHEF: A Pilot Chinese Dataset for Evidence-Based Fact-Checking [55.75590135151682]
CHEF is the first CHinese Evidence-based Fact-checking dataset of 10K real-world claims.
The dataset covers multiple domains, ranging from politics to public health, and provides annotated evidence retrieved from the Internet.
arXiv Detail & Related papers (2022-06-06T09:11:03Z) - AES Systems Are Both Overstable And Oversensitive: Explaining Why And
Proposing Defenses [66.49753193098356]
We investigate the reason behind the surprising adversarial brittleness of scoring models.
Our results indicate that autoscoring models, despite getting trained as "end-to-end" models, behave like bag-of-words models.
We propose detection-based protection models that can detect oversensitivity and overstability causing samples with high accuracies.
arXiv Detail & Related papers (2021-09-24T03:49:38Z) - Few-Shot Cross-Lingual Stance Detection with Sentiment-Based
Pre-Training [32.800766653254634]
We present the most comprehensive study of cross-lingual stance detection to date.
We use 15 diverse datasets in 12 languages from 6 language families.
For our experiments, we build on pattern-exploiting training, proposing the addition of a novel label encoder.
arXiv Detail & Related papers (2021-09-13T15:20:06Z) - Stance Detection Benchmark: How Robust Is Your Stance Detection? [65.91772010586605]
Stance Detection (StD) aims to detect an author's stance towards a certain topic or claim.
We introduce a StD benchmark that learns from ten StD datasets of various domains in a multi-dataset learning setting.
Within this benchmark setup, we are able to present new state-of-the-art results on five of the datasets.
arXiv Detail & Related papers (2020-01-06T13:37:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.