Harnessing the Power of Text-image Contrastive Models for Automatic
Detection of Online Misinformation
- URL: http://arxiv.org/abs/2304.10249v1
- Date: Wed, 19 Apr 2023 02:53:59 GMT
- Title: Harnessing the Power of Text-image Contrastive Models for Automatic
Detection of Online Misinformation
- Authors: Hao Chen, Peng Zheng, Xin Wang, Shu Hu, Bin Zhu, Jinrong Hu, Xi Wu,
Siwei Lyu
- Abstract summary: We develop a self-learning model to explore the constrastive learning in the domain of misinformation identification.
Our model shows the superior performance of non-matched image-text pair detection when the training data is insufficient.
- Score: 50.46219766161111
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As growing usage of social media websites in the recent decades, the amount
of news articles spreading online rapidly, resulting in an unprecedented scale
of potentially fraudulent information. Although a plenty of studies have
applied the supervised machine learning approaches to detect such content, the
lack of gold standard training data has hindered the development. Analysing the
single data format, either fake text description or fake image, is the
mainstream direction for the current research. However, the misinformation in
real-world scenario is commonly formed as a text-image pair where the news
article/news title is described as text content, and usually followed by the
related image. Given the strong ability of learning features without labelled
data, contrastive learning, as a self-learning approach, has emerged and
achieved success on the computer vision. In this paper, our goal is to explore
the constrastive learning in the domain of misinformation identification. We
developed a self-learning model and carried out the comprehensive experiments
on a public data set named COSMOS. Comparing to the baseline classifier, our
model shows the superior performance of non-matched image-text pair detection
(approximately 10%) when the training data is insufficient. In addition, we
observed the stability for contrsative learning and suggested the use of it
offers large reductions in the number of training data, whilst maintaining
comparable classification results.
Related papers
- Exploiting Contextual Uncertainty of Visual Data for Efficient Training of Deep Models [0.65268245109828]
We introduce the notion of contextual diversity for active learning CDAL.
We propose a data repair algorithm to curate contextually fair data to reduce model bias.
We are working on developing image retrieval system for wildlife camera trap images and reliable warning system for poor quality rural roads.
arXiv Detail & Related papers (2024-11-04T09:43:33Z) - Sexism Detection on a Data Diet [14.899608305188002]
We show how we can leverage influence scores to estimate the importance of a data point while training a model.
We evaluate the model performance trained on data pruned with different pruning strategies on three out-of-domain datasets.
arXiv Detail & Related papers (2024-06-07T12:39:54Z) - Premonition: Using Generative Models to Preempt Future Data Changes in
Continual Learning [63.850451635362425]
Continual learning requires a model to adapt to ongoing changes in the data distribution.
We show that the combination of a large language model and an image generation model can similarly provide useful premonitions.
We find that the backbone of our pre-trained networks can learn representations useful for the downstream continual learning problem.
arXiv Detail & Related papers (2024-03-12T06:29:54Z) - Capturing Pertinent Symbolic Features for Enhanced Content-Based
Misinformation Detection [0.0]
The detection of misleading content presents a significant hurdle due to its extreme linguistic and domain variability.
This paper analyzes the linguistic attributes that characterize this phenomenon and how representative of such features some of the most popular misinformation datasets are.
We demonstrate that the appropriate use of pertinent symbolic knowledge in combination with neural language models is helpful in detecting misleading content.
arXiv Detail & Related papers (2024-01-29T16:42:34Z) - Image Captions are Natural Prompts for Text-to-Image Models [70.30915140413383]
We analyze the relationship between the training effect of synthetic data and the synthetic data distribution induced by prompts.
We propose a simple yet effective method that prompts text-to-image generative models to synthesize more informative and diverse training data.
Our method significantly improves the performance of models trained on synthetic training data.
arXiv Detail & Related papers (2023-07-17T14:38:11Z) - Data Quality in Imitation Learning [15.939363481618738]
In offline learning for robotics, we simply lack internet scale data, and so high quality datasets are a necessity.
This is especially true in imitation learning (IL), a sample efficient paradigm for robot learning using expert demonstrations.
In this work, we take the first step toward formalizing data quality for imitation learning through the lens of distribution shift.
arXiv Detail & Related papers (2023-06-04T18:48:32Z) - Semi-Supervised Image Captioning by Adversarially Propagating Labeled
Data [95.0476489266988]
We present a novel data-efficient semi-supervised framework to improve the generalization of image captioning models.
Our proposed method trains a captioner to learn from a paired data and to progressively associate unpaired data.
Our extensive and comprehensive empirical results both on (1) image-based and (2) dense region-based captioning datasets followed by comprehensive analysis on the scarcely-paired dataset.
arXiv Detail & Related papers (2023-01-26T15:25:43Z) - Generative Negative Text Replay for Continual Vision-Language
Pretraining [95.2784858069843]
Vision-language pre-training has attracted increasing attention recently.
Massive data are usually collected in a streaming fashion.
We propose a multi-modal knowledge distillation between images and texts to align the instance-wise prediction between old and new models.
arXiv Detail & Related papers (2022-10-31T13:42:21Z) - Continual Contrastive Self-supervised Learning for Image Classification [10.070132585425938]
Self-supervise learning method shows tremendous potential on visual representation without any labeled data at scale.
To improve the visual representation of self-supervised learning, larger and more varied data is needed.
In this paper, we make the first attempt to implement the continual contrastive self-supervised learning by proposing a rehearsal method.
arXiv Detail & Related papers (2021-07-05T03:53:42Z) - Curious Representation Learning for Embodied Intelligence [81.21764276106924]
Self-supervised representation learning has achieved remarkable success in recent years.
Yet to build truly intelligent agents, we must construct representation learning algorithms that can learn from environments.
We propose a framework, curious representation learning, which jointly learns a reinforcement learning policy and a visual representation model.
arXiv Detail & Related papers (2021-05-03T17:59:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.