Overview of CheckThat! 2020: Automatic Identification and Verification
of Claims in Social Media
- URL: http://arxiv.org/abs/2007.07997v1
- Date: Wed, 15 Jul 2020 21:19:32 GMT
- Title: Overview of CheckThat! 2020: Automatic Identification and Verification
of Claims in Social Media
- Authors: Alberto Barron-Cedeno, Tamer Elsayed, Preslav Nakov, Giovanni Da San
Martino, Maram Hasanain, Reem Suwaileh, Fatima Haouari, Nikolay Babulkov,
Bayan Hamdan, Alex Nikolov, Shaden Shaar, and Zien Sheikh Ali
- Abstract summary: We present an overview of the third edition of the CheckThat! Lab at CLEF 2020.
The lab featured five tasks in two different languages: English and Arabic.
We describe the tasks setup, the evaluation results, and a summary of the approaches used by the participants.
- Score: 26.60148306714383
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present an overview of the third edition of the CheckThat! Lab at CLEF
2020. The lab featured five tasks in two different languages: English and
Arabic. The first four tasks compose the full pipeline of claim verification in
social media: Task 1 on check-worthiness estimation, Task 2 on retrieving
previously fact-checked claims, Task 3 on evidence retrieval, and Task 4 on
claim verification. The lab is completed with Task 5 on check-worthiness
estimation in political debates and speeches. A total of 67 teams registered to
participate in the lab (up from 47 at CLEF 2019), and 23 of them actually
submitted runs (compared to 14 at CLEF 2019). Most teams used deep neural
networks based on BERT, LSTMs, or CNNs, and achieved sizable improvements over
the baselines on all tasks. Here we describe the tasks setup, the evaluation
results, and a summary of the approaches used by the participants, and we
discuss some lessons learned. Last but not least, we release to the research
community all datasets from the lab as well as the evaluation scripts, which
should enable further research in the important tasks of check-worthiness
estimation and automatic claim verification.
Related papers
- MMAU: A Holistic Benchmark of Agent Capabilities Across Diverse Domains [54.117238759317004]
Massive Multitask Agent Understanding (MMAU) benchmark features comprehensive offline tasks that eliminate the need for complex environment setups.
It evaluates models across five domains, including Tool-use, Directed Acyclic Graph (DAG) QA, Data Science and Machine Learning coding, Contest-level programming and Mathematics.
With a total of 20 meticulously designed tasks encompassing over 3K distinct prompts, MMAU provides a comprehensive framework for evaluating the strengths and limitations of LLM agents.
arXiv Detail & Related papers (2024-07-18T00:58:41Z) - FactFinders at CheckThat! 2024: Refining Check-worthy Statement Detection with LLMs through Data Pruning [43.82613670331329]
This study investigates the application of open-source language models to identify check-worthy statements from political transcriptions.
We propose a two-step data pruning approach to automatically identify high-quality training data instances for effective learning.
Our team ranked first in the check-worthiness estimation task in the English language.
arXiv Detail & Related papers (2024-06-26T12:31:31Z) - SemEval-2024 Task 8: Multidomain, Multimodel and Multilingual Machine-Generated Text Detection [68.858931667807]
Subtask A is a binary classification task determining whether a text is written by a human or generated by a machine.
Subtask B is to detect the exact source of a text, discerning whether it is written by a human or generated by a specific LLM.
Subtask C aims to identify the changing point within a text, at which the authorship transitions from human to machine.
arXiv Detail & Related papers (2024-04-22T13:56:07Z) - SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding
Tasks [88.4408774253634]
Spoken language understanding (SLU) tasks have been studied for many decades in the speech research community.
There are not nearly as many SLU task benchmarks, and many of the existing ones use data that is not freely available to all researchers.
Recent work has begun to introduce such benchmark for several tasks.
arXiv Detail & Related papers (2022-12-20T18:39:59Z) - Task Compass: Scaling Multi-task Pre-training with Task Prefix [122.49242976184617]
Existing studies show that multi-task learning with large-scale supervised tasks suffers from negative effects across tasks.
We propose a task prefix guided multi-task pre-training framework to explore the relationships among tasks.
Our model can not only serve as the strong foundation backbone for a wide range of tasks but also be feasible as a probing tool for analyzing task relationships.
arXiv Detail & Related papers (2022-10-12T15:02:04Z) - FETA: A Benchmark for Few-Sample Task Transfer in Open-Domain Dialogue [70.65782786401257]
This work explores conversational task transfer by introducing FETA: a benchmark for few-sample task transfer in open-domain dialogue.
FETA contains two underlying sets of conversations upon which there are 10 and 7 tasks annotated, enabling the study of intra-dataset task transfer.
We utilize three popular language models and three learning algorithms to analyze the transferability between 132 source-target task pairs.
arXiv Detail & Related papers (2022-05-12T17:59:00Z) - Overview of the CLEF-2019 CheckThat!: Automatic Identification and
Verification of Claims [26.96108180116284]
CheckThat! lab featured two tasks in two different languages: English and Arabic.
The most successful approaches to Task 1 used various neural networks and logistic regression.
Learning-to-rank was used by the highest scoring runs for subtask A.
arXiv Detail & Related papers (2021-09-25T16:08:09Z) - Overview of the CLEF--2021 CheckThat! Lab on Detecting Check-Worthy
Claims, Previously Fact-Checked Claims, and Fake News [21.574997165145486]
We describe the fourth edition of the CheckThat! Lab, part of the 2021 Conference and Labs of the Evaluation Forum (CLEF)
The lab evaluates technology supporting tasks related to factuality, and covers Arabic, Bulgarian, English, Spanish, and Turkish.
arXiv Detail & Related papers (2021-09-23T06:10:36Z) - ASVspoof 2021: accelerating progress in spoofed and deepfake speech
detection [70.45884214674057]
ASVspoof 2021 is the forth edition in the series of bi-annual challenges which aim to promote the study of spoofing.
This paper describes all three tasks, the new databases for each of them, the evaluation metrics, four challenge baselines, the evaluation platform and a summary of challenge results.
arXiv Detail & Related papers (2021-09-01T16:17:31Z) - Overview of CLEF 2019 Lab ProtestNews: Extracting Protests from News in
a Cross-context Setting [3.5132824436572685]
The lab consists of document, sentence, and token level information classification and extraction tasks.
The training and development data were collected from India and test data was collected from India and China.
We have observed neural networks yield the best results and the performance drops significantly for majority of the submissions in the cross-country setting.
arXiv Detail & Related papers (2020-08-01T21:39:54Z) - CheckThat! at CLEF 2020: Enabling the Automatic Identification and
Verification of Claims in Social Media [28.070608555714752]
CheckThat! proposes four complementary tasks and a related task from previous lab editions.
The evaluation is carried out using mean average precision or precision at rank k for ranking tasks, and F1 for classification tasks.
arXiv Detail & Related papers (2020-01-21T06:47:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.