Related papers: X-FACT: A New Benchmark Dataset for Multilingual Fact Checking

X-FACT: A New Benchmark Dataset for Multilingual Fact Checking

URL: http://arxiv.org/abs/2106.09248v1
Date: Thu, 17 Jun 2021 05:09:54 GMT
Title: X-FACT: A New Benchmark Dataset for Multilingual Fact Checking
Authors: Ashim Gupta and Vivek Srikumar
Abstract summary: We introduce X-FACT: the largest publicly available multilingual dataset for factual verification of naturally existing real-world claims. The dataset contains short statements in 25 languages and is labeled for veracity by expert fact-checkers.
Score: 21.2633064526968
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this work, we introduce X-FACT: the largest publicly available multilingual dataset for factual verification of naturally existing real-world claims. The dataset contains short statements in 25 languages and is labeled for veracity by expert fact-checkers. The dataset includes a multilingual evaluation benchmark that measures both out-of-domain generalization, and zero-shot capabilities of the multilingual models. Using state-of-the-art multilingual transformer-based models, we develop several automated fact-checking models that, along with textual claims, make use of additional metadata and evidence from news stories retrieved using a search engine. Empirically, our best model attains an F-score of around 40%, suggesting that our dataset is a challenging benchmark for evaluation of multilingual fact-checking models.

Related papers

Beyond Translation: LLM-Based Data Generation for Multilingual Fact-Checking [2.321323878201932]
MultiSynFact is the first large-scale multilingual fact-checking dataset containing 2.2M claim-source pairs. Our dataset generation pipeline leverages Large Language Models (LLMs), integrating external knowledge from Wikipedia. We open-source a user-friendly framework to facilitate further research in multilingual fact-checking and dataset generation.
arXiv Detail & Related papers (2025-02-21T12:38:26Z)
P-MMEval: A Parallel Multilingual Multitask Benchmark for Consistent Evaluation of LLMs [84.24644520272835]
Large language models (LLMs) showcase varied multilingual capabilities across tasks like translation, code generation, and reasoning. Previous assessments often limited their scope to fundamental natural language processing (NLP) or isolated capability-specific tasks. We present a pipeline for selecting available and reasonable benchmarks from massive ones, addressing the oversight in previous work regarding the utility of these benchmarks. We introduce P-MMEval, a large-scale benchmark covering effective fundamental and capability-specialized datasets.
arXiv Detail & Related papers (2024-11-14T01:29:36Z)
Multilingual and Multi-topical Benchmark of Fine-tuned Language models and Large Language Models for Check-Worthy Claim Detection [1.4779899760345434]
This study compares the performance of (1) fine-tuned language models and (2) large language models on the task of check-worthy claim detection. We composed a multilingual and multi-topical dataset comprising texts of various sources and styles.
arXiv Detail & Related papers (2023-11-10T15:36:35Z)
An Open Dataset and Model for Language Identification [84.15194457400253]
We present a LID model which achieves a macro-average F1 score of 0.93 and a false positive rate of 0.033 across 201 languages. We make both the model and the dataset available to the research community.
arXiv Detail & Related papers (2023-05-23T08:43:42Z)
XTREME-UP: A User-Centric Scarce-Data Benchmark for Under-Represented Languages [105.54207724678767]
Data scarcity is a crucial issue for the development of highly multilingual NLP systems. We propose XTREME-UP, a benchmark defined by its focus on the scarce-data scenario rather than zero-shot. XTREME-UP evaluates the capabilities of language models across 88 under-represented languages over 9 key user-centric technologies.
arXiv Detail & Related papers (2023-05-19T18:00:03Z)
mFACE: Multilingual Summarization with Factual Consistency Evaluation [79.60172087719356]
Abstractive summarization has enjoyed renewed interest in recent years, thanks to pre-trained language models and the availability of large-scale datasets. Despite promising results, current models still suffer from generating factually inconsistent summaries. We leverage factual consistency evaluation models to improve multilingual summarization.
arXiv Detail & Related papers (2022-12-20T19:52:41Z)
Multi-lingual Evaluation of Code Generation Models [82.7357812992118]
We present new benchmarks on evaluation code generation models: MBXP and Multilingual HumanEval, and MathQA-X. These datasets cover over 10 programming languages. We are able to assess the performance of code generation models in a multi-lingual fashion.
arXiv Detail & Related papers (2022-10-26T17:17:06Z)
A Multilingual Bag-of-Entities Model for Zero-Shot Cross-Lingual Text Classification [16.684856745734944]
We present a multilingual bag-of-entities model that boosts the performance of zero-shot cross-lingual text classification. It leverages the multilingual nature of Wikidata: entities in multiple languages representing the same concept are defined with a unique identifier. A model trained on entity features in a resource-rich language can thus be directly applied to other languages.
arXiv Detail & Related papers (2021-10-15T01:10:50Z)
XL-WiC: A Multilingual Benchmark for Evaluating Semantic Contextualization [98.61159823343036]
We present the Word-in-Context dataset (WiC) for assessing the ability to correctly model distinct meanings of a word. We put forward a large multilingual benchmark, XL-WiC, featuring gold standards in 12 new languages. Experimental results show that even when no tagged instances are available for a target language, models trained solely on the English data can attain competitive performance.
arXiv Detail & Related papers (2020-10-13T15:32:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.