Detecting Misinformation with LLM-Predicted Credibility Signals and Weak
Supervision
- URL: http://arxiv.org/abs/2309.07601v1
- Date: Thu, 14 Sep 2023 11:06:51 GMT
- Title: Detecting Misinformation with LLM-Predicted Credibility Signals and Weak
Supervision
- Authors: Jo\~ao A. Leite, Olesya Razuvayevskaya, Kalina Bontcheva, Carolina
Scarton
- Abstract summary: This paper investigates whether language models (LLMs) can be prompted effectively with a set of 18 credibility signals to produce weak labels for each signal.
We then aggregate these potentially noisy labels using weak supervision to predict content veracity.
We demonstrate that our approach, which combines zero-shot credibility signal labeling and weak supervision, outperforms state-of-the-art LLMs on two datasets.
- Score: 5.348343219992815
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Credibility signals represent a wide range of heuristics that are typically
used by journalists and fact-checkers to assess the veracity of online content.
Automating the task of credibility signal extraction, however, is very
challenging as it requires high-accuracy signal-specific extractors to be
trained, while there are currently no sufficiently large datasets annotated
with all credibility signals. This paper investigates whether large language
models (LLMs) can be prompted effectively with a set of 18 credibility signals
to produce weak labels for each signal. We then aggregate these potentially
noisy labels using weak supervision in order to predict content veracity. We
demonstrate that our approach, which combines zero-shot LLM credibility signal
labeling and weak supervision, outperforms state-of-the-art classifiers on two
misinformation datasets without using any ground-truth labels for training. We
also analyse the contribution of the individual credibility signals towards
predicting content veracity, which provides new valuable insights into their
role in misinformation detection.
Related papers
- Learning High-Quality Latent Representations for Anomaly Detection and Signal Integrity Enhancement in High-Speed Signals [3.0017241250121387]
This paper addresses the dual challenge of improving anomaly detection and signal integrity in high-speed dynamic random access memory signals.<n>We propose a joint training framework that integrates an autoencoder with a classifier to learn more distinctive latent representations.<n>We introduce a signal integrity enhancement algorithm that improves signal integrity by an average of 11.3%.
arXiv Detail & Related papers (2025-06-23T04:48:22Z) - SelfCheckAgent: Zero-Resource Hallucination Detection in Generative Large Language Models [0.16385815610837165]
SelfCheckAgent is a novel framework integrating three different agents.
These agents provide a robust multi-dimensional approach to hallucination detection.
The framework also incorporates a triangulation strategy, which increases the strengths of the SelfCheckAgent.
arXiv Detail & Related papers (2025-02-03T20:42:32Z) - Benchmarking Zero-Shot Robustness of Multimodal Foundation Models: A Pilot Study [61.65123150513683]
multimodal foundation models, such as CLIP, produce state-of-the-art zero-shot results.
It is reported that these models close the robustness gap by matching the performance of supervised models trained on ImageNet.
We show that CLIP leads to a significant robustness drop compared to supervised ImageNet models on our benchmark.
arXiv Detail & Related papers (2024-03-15T17:33:49Z) - Always be Pre-Training: Representation Learning for Network Intrusion Detection with GNNs [6.589041710104928]
Graph neural network-based network intrusion detection systems have recently demonstrated state-of-the-art performance on benchmark datasets.
These methods suffer from a reliance on target encoding for data pre-processing, limiting widespread adoption due to the associated need for annotated labels.
We propose a solution involving in-context pre-training and the utilization of dense representations for categorical features to jointly overcome the label-dependency limitation.
arXiv Detail & Related papers (2024-02-29T09:40:07Z) - The Risk of Federated Learning to Skew Fine-Tuning Features and
Underperform Out-of-Distribution Robustness [50.52507648690234]
Federated learning has the risk of skewing fine-tuning features and compromising the robustness of the model.
We introduce three robustness indicators and conduct experiments across diverse robust datasets.
Our approach markedly enhances the robustness across diverse scenarios, encompassing various parameter-efficient fine-tuning methods.
arXiv Detail & Related papers (2024-01-25T09:18:51Z) - Fusing Pseudo Labels with Weak Supervision for Dynamic Traffic Scenarios [0.0]
We introduce a weakly-supervised label unification pipeline that amalgamates pseudo labels from object detection models trained on heterogeneous datasets.
Our pipeline engenders a unified label space through the amalgamation of labels from disparate datasets, rectifying bias and enhancing generalization.
We retrain a solitary object detection model using the merged label space, culminating in a resilient model proficient in dynamic traffic scenarios.
arXiv Detail & Related papers (2023-08-30T11:33:07Z) - Data AUDIT: Identifying Attribute Utility- and Detectability-Induced
Bias in Task Models [8.420252576694583]
We present a first technique for the rigorous, quantitative screening of medical image datasets.
Our method decomposes the risks associated with dataset attributes in terms of their detectability and utility.
Using our method, we show our screening method reliably identifies nearly imperceptible bias-inducing artifacts.
arXiv Detail & Related papers (2023-04-06T16:50:15Z) - Cluster-level pseudo-labelling for source-free cross-domain facial
expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER)
Our method exploits self-supervised pretraining to learn good feature representations from the target data.
We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z) - Generative Modeling Helps Weak Supervision (and Vice Versa) [87.62271390571837]
We propose a model fusing weak supervision and generative adversarial networks.
It captures discrete variables in the data alongside the weak supervision derived label estimate.
It is the first approach to enable data augmentation through weakly supervised synthetic images and pseudolabels.
arXiv Detail & Related papers (2022-03-22T20:24:21Z) - Towards Reducing Labeling Cost in Deep Object Detection [61.010693873330446]
We propose a unified framework for active learning, that considers both the uncertainty and the robustness of the detector.
Our method is able to pseudo-label the very confident predictions, suppressing a potential distribution drift.
arXiv Detail & Related papers (2021-06-22T16:53:09Z) - ReLearn: A Robust Machine Learning Framework in Presence of Missing Data
for Multimodal Stress Detection from Physiological Signals [5.042598205771715]
We propose ReLearn, a robust machine learning framework for stress detection from biomarkers extracted from multimodal physiological signals.
ReLearn effectively copes with missing data and outliers both at training and inference phases.
Our experiments show that the proposed framework obtains a cross-validation accuracy of 86.8% even if more than 50% of samples within the features are missing.
arXiv Detail & Related papers (2021-04-29T11:53:01Z) - Negative Data Augmentation [127.28042046152954]
We show that negative data augmentation samples provide information on the support of the data distribution.
We introduce a new GAN training objective where we use NDA as an additional source of synthetic data for the discriminator.
Empirically, models trained with our method achieve improved conditional/unconditional image generation along with improved anomaly detection capabilities.
arXiv Detail & Related papers (2021-02-09T20:28:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.