Claim Detection for Automated Fact-checking: A Survey on Monolingual, Multilingual and Cross-Lingual Research
- URL: http://arxiv.org/abs/2401.11969v3
- Date: Mon, 18 Mar 2024 16:49:59 GMT
- Title: Claim Detection for Automated Fact-checking: A Survey on Monolingual, Multilingual and Cross-Lingual Research
- Authors: Rrubaa Panchendrarajan, Arkaitz Zubiaga,
- Abstract summary: We present state-of-the-art multilingual claim detection research categorized into three key factors of the problem, verifiability, priority, and similarity.
We present a detailed overview of the existing multilingual datasets along with the challenges and suggest possible future advancements.
- Score: 7.242609314791262
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automated fact-checking has drawn considerable attention over the past few decades due to the increase in the diffusion of misinformation on online platforms. This is often carried out as a sequence of tasks comprising (i) the detection of sentences circulating in online platforms which constitute claims needing verification, followed by (ii) the verification process of those claims. This survey focuses on the former, by discussing existing efforts towards detecting claims needing fact-checking, with a particular focus on multilingual data and methods. This is a challenging and fertile direction where existing methods are yet far from matching human performance due to the profoundly challenging nature of the issue. Especially, the dissemination of information across multiple social platforms, articulated in multiple languages and modalities demands more generalized solutions for combating misinformation. Focusing on multilingual misinformation, we present a comprehensive survey of existing multilingual claim detection research. We present state-of-the-art multilingual claim detection research categorized into three key factors of the problem, verifiability, priority, and similarity. Further, we present a detailed overview of the existing multilingual datasets along with the challenges and suggest possible future advancements.
Related papers
- Monolingual and Multilingual Misinformation Detection for Low-Resource Languages: A Comprehensive Survey [2.5459710368096586]
This survey provides a comprehensive overview of the current research on low-resource language misinformation detection.
We review the existing datasets, methodologies, and tools used in these domains, identifying key challenges related to: data resources, model development, cultural and linguistic context, real-world applications, and research efforts.
Our findings underscore the need for robust, inclusive systems capable of addressing misinformation across diverse linguistic and cultural contexts.
arXiv Detail & Related papers (2024-10-24T03:02:03Z) - Asking Multimodal Clarifying Questions in Mixed-Initiative
Conversational Search [89.1772985740272]
In mixed-initiative conversational search systems, clarifying questions are used to help users who struggle to express their intentions in a single query.
We hypothesize that in scenarios where multimodal information is pertinent, the clarification process can be improved by using non-textual information.
We collect a dataset named Melon that contains over 4k multimodal clarifying questions, enriched with over 14k images.
Several analyses are conducted to understand the importance of multimodal contents during the query clarification phase.
arXiv Detail & Related papers (2024-02-12T16:04:01Z) - Lost in Translation -- Multilingual Misinformation and its Evolution [52.07628580627591]
This paper investigates the prevalence and dynamics of multilingual misinformation through an analysis of over 250,000 unique fact-checks spanning 95 languages.
We find that while the majority of misinformation claims are only fact-checked once, 11.7%, corresponding to more than 21,000 claims, are checked multiple times.
Using fact-checks as a proxy for the spread of misinformation, we find 33% of repeated claims cross linguistic boundaries.
arXiv Detail & Related papers (2023-10-27T12:21:55Z) - Breaking Language Barriers with MMTweets: Advancing Cross-Lingual Debunked Narrative Retrieval for Fact-Checking [5.880794128275313]
Cross-lingual debunked narrative retrieval is an understudied problem.
This study introduces cross-lingual debunked narrative retrieval and addresses this research gap by: (i) creating Multilingual Misinformation Tweets (MMTweets)
MMTweets features cross-lingual pairs, images, human annotations, and fine-grained labels, making it a comprehensive resource compared to its counterparts.
We find that MMTweets presents challenges for cross-lingual debunked narrative retrieval, highlighting areas for improvement in retrieval models.
arXiv Detail & Related papers (2023-08-10T16:33:17Z) - MINION: a Large-Scale and Diverse Dataset for Multilingual Event
Detection [65.46122357928041]
Event Detection (ED) is the task of identifying and classifying trigger words of event mentions in text.
Main questions include how well existing ED models perform on different languages, how challenging ED is in other languages, and how well ED knowledge and annotation can be transferred across languages.
We introduce a new large-scale multilingual dataset for ED (called MINION) that consistently annotates events for 8 different languages.
arXiv Detail & Related papers (2022-11-11T02:09:51Z) - CONCRETE: Improving Cross-lingual Fact-checking with Cross-lingual
Retrieval [73.48591773882052]
Most fact-checking approaches focus on English only due to the data scarcity issue in other languages.
We present the first fact-checking framework augmented with crosslingual retrieval.
We train the retriever with our proposed Crosslingual Inverse Cloze Task (XICT)
arXiv Detail & Related papers (2022-09-05T17:36:14Z) - Delving Deeper into Cross-lingual Visual Question Answering [115.16614806717341]
We show that simple modifications to the standard training setup can substantially reduce the transfer gap to monolingual English performance.
We analyze cross-lingual VQA across different question types of varying complexity for different multilingual multimodal Transformers.
arXiv Detail & Related papers (2022-02-15T18:22:18Z) - Matching Tweets With Applicable Fact-Checks Across Languages [27.762055254009017]
We focus on automatically finding existing fact-checks for claims made in social media posts (tweets)
We conduct both classification and retrieval experiments, in monolingual (English only), multilingual (Spanish, Portuguese), and cross-lingual (Hindi-English) settings.
We present promising results for "match" classification (93% average accuracy) in four language pairs.
arXiv Detail & Related papers (2022-02-14T23:33:02Z) - AraStance: A Multi-Country and Multi-Domain Dataset of Arabic Stance
Detection for Fact Checking [19.962693437515753]
We present our new Arabic Stance Detection dataset (AraStance) of 910 claims from a diverse set of sources.
AraStance covers false and true claims from multiple domains (e.g., politics, sports, health) and several Arab countries.
Our best model achieves an accuracy of 85% and a macro F1 score of 78%, which leaves room for improvement.
arXiv Detail & Related papers (2021-04-28T03:38:24Z) - Crossing the Conversational Chasm: A Primer on Multilingual
Task-Oriented Dialogue Systems [51.328224222640614]
Current state-of-the-art ToD models based on large pretrained neural language models are data hungry.
Data acquisition for ToD use cases is expensive and tedious.
arXiv Detail & Related papers (2021-04-17T15:19:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.