Overview of the CLEF-2019 CheckThat!: Automatic Identification and
Verification of Claims
- URL: http://arxiv.org/abs/2109.15118v1
- Date: Sat, 25 Sep 2021 16:08:09 GMT
- Title: Overview of the CLEF-2019 CheckThat!: Automatic Identification and
Verification of Claims
- Authors: Tamer Elsayed, Preslav Nakov, Alberto Barr\'on-Cede\~no, Maram
Hasanain, Reem Suwaileh, Giovanni Da San Martino, Pepa Atanasova
- Abstract summary: CheckThat! lab featured two tasks in two different languages: English and Arabic.
The most successful approaches to Task 1 used various neural networks and logistic regression.
Learning-to-rank was used by the highest scoring runs for subtask A.
- Score: 26.96108180116284
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present an overview of the second edition of the CheckThat! Lab at CLEF
2019. The lab featured two tasks in two different languages: English and
Arabic. Task 1 (English) challenged the participating systems to predict which
claims in a political debate or speech should be prioritized for fact-checking.
Task 2 (Arabic) asked to (A) rank a given set of Web pages with respect to a
check-worthy claim based on their usefulness for fact-checking that claim, (B)
classify these same Web pages according to their degree of usefulness for
fact-checking the target claim, (C) identify useful passages from these pages,
and (D) use the useful pages to predict the claim's factuality. CheckThat!
provided a full evaluation framework, consisting of data in English (derived
from fact-checking sources) and Arabic (gathered and annotated from scratch)
and evaluation based on mean average precision (MAP) and normalized discounted
cumulative gain (nDCG) for ranking, and F1 for classification. A total of 47
teams registered to participate in this lab, and fourteen of them actually
submitted runs (compared to nine last year). The evaluation results show that
the most successful approaches to Task 1 used various neural networks and
logistic regression. As for Task 2, learning-to-rank was used by the highest
scoring runs for subtask A, while different classifiers were used in the other
subtasks. We release to the research community all datasets from the lab as
well as the evaluation scripts, which should enable further research in the
important tasks of check-worthiness estimation and automatic claim
verification.
Related papers
- Mind the Prompt: A Novel Benchmark for Prompt-based Class-Agnostic Counting [8.000723123087473]
Class-agnostic counting (CAC) is a recent task in computer vision that aims to estimate the number of instances of arbitrary object classes never seen during model training.
We introduce the Prompt-Aware Counting benchmark, which comprises two targeted tests, each accompanied by appropriate evaluation metrics.
arXiv Detail & Related papers (2024-09-24T10:35:42Z) - Evaluating Generative Language Models in Information Extraction as Subjective Question Correction [49.729908337372436]
We propose a new evaluation method, SQC-Score.
Inspired by the principles in subjective question correction, we propose a new evaluation method, SQC-Score.
Results on three information extraction tasks show that SQC-Score is more preferred by human annotators than the baseline metrics.
arXiv Detail & Related papers (2024-04-04T15:36:53Z) - Fraunhofer SIT at CheckThat! 2023: Tackling Classification Uncertainty
Using Model Souping on the Example of Check-Worthiness Classification [0.0]
This paper describes the second-placed approach developed by the Fraunhofer SIT team in the CLEF-2023 CheckThat! lab Task 1B for English.
Given a text snippet from a political debate, the aim of this task is to determine whether it should be assessed for check-worthiness.
arXiv Detail & Related papers (2023-07-03T09:27:46Z) - SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding
Tasks [88.4408774253634]
Spoken language understanding (SLU) tasks have been studied for many decades in the speech research community.
There are not nearly as many SLU task benchmarks, and many of the existing ones use data that is not freely available to all researchers.
Recent work has begun to introduce such benchmark for several tasks.
arXiv Detail & Related papers (2022-12-20T18:39:59Z) - UU-Tax at SemEval-2022 Task 3: Improving the generalizability of
language models for taxonomy classification through data augmentation [0.0]
This paper addresses the SemEval-2022 Task 3 PreTENS: Presupposed Taxonomies evaluating Neural Network Semantics.
The goal of the task is to identify if a sentence is deemed acceptable or not, depending on the taxonomic relationship that holds between a noun pair contained in the sentence.
We propose an effective way to enhance the robustness and the generalizability of language models for better classification.
arXiv Detail & Related papers (2022-10-07T07:41:28Z) - Overview of CheckThat! 2020: Automatic Identification and Verification
of Claims in Social Media [26.60148306714383]
We present an overview of the third edition of the CheckThat! Lab at CLEF 2020.
The lab featured five tasks in two different languages: English and Arabic.
We describe the tasks setup, the evaluation results, and a summary of the approaches used by the participants.
arXiv Detail & Related papers (2020-07-15T21:19:32Z) - Mining Implicit Relevance Feedback from User Behavior for Web Question
Answering [92.45607094299181]
We make the first study to explore the correlation between user behavior and passage relevance.
Our approach significantly improves the accuracy of passage ranking without extra human labeled data.
In practice, this work has proved effective to substantially reduce the human labeling cost for the QA service in a global commercial search engine.
arXiv Detail & Related papers (2020-06-13T07:02:08Z) - L2R2: Leveraging Ranking for Abductive Reasoning [65.40375542988416]
The abductive natural language inference task ($alpha$NLI) is proposed to evaluate the abductive reasoning ability of a learning system.
A novel $L2R2$ approach is proposed under the learning-to-rank framework.
Experiments on the ART dataset reach the state-of-the-art in the public leaderboard.
arXiv Detail & Related papers (2020-05-22T15:01:23Z) - Generating Fact Checking Explanations [52.879658637466605]
A crucial piece of the puzzle that is still missing is to understand how to automate the most elaborate part of the process.
This paper provides the first study of how these explanations can be generated automatically based on available claim context.
Our results indicate that optimising both objectives at the same time, rather than training them separately, improves the performance of a fact checking system.
arXiv Detail & Related papers (2020-04-13T05:23:25Z) - Overview of the TREC 2019 Fair Ranking Track [65.15263872493799]
The goal of the TREC Fair Ranking track was to develop a benchmark for evaluating retrieval systems in terms of fairness to different content providers.
This paper presents an overview of the track, including the task definition, descriptions of the data and the annotation process.
arXiv Detail & Related papers (2020-03-25T21:34:58Z) - CheckThat! at CLEF 2020: Enabling the Automatic Identification and
Verification of Claims in Social Media [28.070608555714752]
CheckThat! proposes four complementary tasks and a related task from previous lab editions.
The evaluation is carried out using mean average precision or precision at rank k for ranking tasks, and F1 for classification tasks.
arXiv Detail & Related papers (2020-01-21T06:47:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.