Is it indeed bigger better? The comprehensive study of claim detection
LMs applied for disinformation tackling
- URL: http://arxiv.org/abs/2311.06121v1
- Date: Fri, 10 Nov 2023 15:36:35 GMT
- Title: Is it indeed bigger better? The comprehensive study of claim detection
LMs applied for disinformation tackling
- Authors: Martin Hyben, Sebastian Kula, Ivan Srba, Robert Moro, Jakub Simko
- Abstract summary: This study compares the performance of fine-tuned models and extremely large language models on the task of check-worthy claim detection.
We composed a multilingual and multi-topical dataset comprising texts of various sources and styles.
- Score: 1.5856555660089906
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This study compares the performance of (1) fine-tuned models and (2)
extremely large language models on the task of check-worthy claim detection.
For the purpose of the comparison we composed a multilingual and multi-topical
dataset comprising texts of various sources and styles. Building on this, we
performed a benchmark analysis to determine the most general multilingual and
multi-topical claim detector.
We chose three state-of-the-art models in the check-worthy claim detection
task and fine-tuned them. Furthermore, we selected three state-of-the-art
extremely large language models without any fine-tuning. We made modifications
to the models to adapt them for multilingual settings and through extensive
experimentation and evaluation. We assessed the performance of all the models
in terms of accuracy, recall, and F1-score in in-domain and cross-domain
scenarios. Our results demonstrate that despite the technological progress in
the area of natural language processing, the models fine-tuned for the task of
check-worthy claim detection still outperform the zero-shot approaches in a
cross-domain settings.
Related papers
- Language Models for Text Classification: Is In-Context Learning Enough? [54.869097980761595]
Recent foundational language models have shown state-of-the-art performance in many NLP tasks in zero- and few-shot settings.
An advantage of these models over more standard approaches is the ability to understand instructions written in natural language (prompts)
This makes them suitable for addressing text classification problems for domains with limited amounts of annotated instances.
arXiv Detail & Related papers (2024-03-26T12:47:39Z) - Do We Need Language-Specific Fact-Checking Models? The Case of Chinese [17.55466402274949]
This paper investigates the potential benefits of language-specific fact-checking models, focusing on the case of Chinese.
We first demonstrate the limitations of translation-based methods and multilingual large language models, highlighting the need for language-specific systems.
We propose a Chinese fact-checking system that can better retrieve evidence from a document by incorporating context information.
arXiv Detail & Related papers (2024-01-27T20:26:03Z) - Split and Rephrase with Large Language Models [2.499907423888049]
Split and Rephrase (SPRP) task consists in splitting complex sentences into a sequence of shorter grammatical sentences.
We evaluate large language models on the task, showing that they can provide large improvements over the state of the art on the main metrics.
arXiv Detail & Related papers (2023-12-18T10:16:37Z) - On the Analysis of Cross-Lingual Prompt Tuning for Decoder-based
Multilingual Model [49.81429697921861]
We study the interaction between parameter-efficient fine-tuning (PEFT) and cross-lingual tasks in multilingual autoregressive models.
We show that prompt tuning is more effective in enhancing the performance of low-resource languages than fine-tuning.
arXiv Detail & Related papers (2023-11-14T00:43:33Z) - A Comprehensive Evaluation and Analysis Study for Chinese Spelling Check [53.152011258252315]
We show that using phonetic and graphic information reasonably is effective for Chinese Spelling Check.
Models are sensitive to the error distribution of the test set, which reflects the shortcomings of models.
The commonly used benchmark, SIGHAN, can not reliably evaluate models' performance.
arXiv Detail & Related papers (2023-07-25T17:02:38Z) - ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented
Visual Models [102.63817106363597]
We build ELEVATER, the first benchmark to compare and evaluate pre-trained language-augmented visual models.
It consists of 20 image classification datasets and 35 object detection datasets, each of which is augmented with external knowledge.
We will release our toolkit and evaluation platforms for the research community.
arXiv Detail & Related papers (2022-04-19T10:23:42Z) - An Application of Pseudo-Log-Likelihoods to Natural Language Scoring [5.382454613390483]
A language model with relatively few parameters and training steps can outperform it on a recent large data set.
We produce some absolute state-of-the-art results for common sense reasoning in binary choice tasks.
We argue that robustness of the smaller model ought to be understood in terms of compositionality.
arXiv Detail & Related papers (2022-01-23T22:00:54Z) - Unsupervised Domain Adaptation of a Pretrained Cross-Lingual Language
Model [58.27176041092891]
Recent research indicates that pretraining cross-lingual language models on large-scale unlabeled texts yields significant performance improvements.
We propose a novel unsupervised feature decomposition method that can automatically extract domain-specific features from the entangled pretrained cross-lingual representations.
Our proposed model leverages mutual information estimation to decompose the representations computed by a cross-lingual model into domain-invariant and domain-specific parts.
arXiv Detail & Related papers (2020-11-23T16:00:42Z) - Comparison of Interactive Knowledge Base Spelling Correction Models for
Low-Resource Languages [81.90356787324481]
Spelling normalization for low resource languages is a challenging task because the patterns are hard to predict.
This work shows a comparison of a neural model and character language models with varying amounts on target language data.
Our usage scenario is interactive correction with nearly zero amounts of training examples, improving models as more data is collected.
arXiv Detail & Related papers (2020-10-20T17:31:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.