A Multi-Modal Method for Satire Detection using Textual and Visual Cues
- URL: http://arxiv.org/abs/2010.06671v1
- Date: Tue, 13 Oct 2020 20:08:29 GMT
- Title: A Multi-Modal Method for Satire Detection using Textual and Visual Cues
- Authors: Lily Li, Or Levi, Pedram Hosseini, David A. Broniatowski
- Abstract summary: Satire is a form of humorous critique, but it is sometimes misinterpreted by readers as legitimate news.
We observe that the images used in satirical news articles often contain absurd or ridiculous content.
We propose a multi-modal approach based on state-of-the-art visiolinguistic model ViLBERT.
- Score: 5.147194328754225
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Satire is a form of humorous critique, but it is sometimes misinterpreted by
readers as legitimate news, which can lead to harmful consequences. We observe
that the images used in satirical news articles often contain absurd or
ridiculous content and that image manipulation is used to create fictional
scenarios. While previous work have studied text-based methods, in this work we
propose a multi-modal approach based on state-of-the-art visiolinguistic model
ViLBERT. To this end, we create a new dataset consisting of images and
headlines of regular and satirical news for the task of satire detection. We
fine-tune ViLBERT on the dataset and train a convolutional neural network that
uses an image forensics technique. Evaluation on the dataset shows that our
proposed multi-modal approach outperforms image-only, text-only, and simple
fusion baselines.
Related papers
- YesBut: A High-Quality Annotated Multimodal Dataset for evaluating Satire Comprehension capability of Vision-Language Models [21.290282716770157]
Satirical Image Detection (detecting whether an image is satirical), Understanding (generating the reason behind the image being satirical), and Completion (given one half of the image, selecting the other half from 2 given options, such that the complete image is satirical) are proposed.
We release a dataset of 119 real, satirical photographs for further research.
arXiv Detail & Related papers (2024-09-20T15:45:29Z) - Visually-Aware Context Modeling for News Image Captioning [54.31708859631821]
News Image Captioning aims to create captions from news articles and images.
We propose a face-naming module for learning better name embeddings.
We use CLIP to retrieve sentences that are semantically close to the image.
arXiv Detail & Related papers (2023-08-16T12:39:39Z) - Advancing Visual Grounding with Scene Knowledge: Benchmark and Method [74.72663425217522]
Visual grounding (VG) aims to establish fine-grained alignment between vision and language.
Most existing VG datasets are constructed using simple description texts.
We propose a novel benchmark of underlineScene underlineKnowledge-guided underlineVisual underlineGrounding.
arXiv Detail & Related papers (2023-07-21T13:06:02Z) - SpaText: Spatio-Textual Representation for Controllable Image Generation [61.89548017729586]
SpaText is a new method for text-to-image generation using open-vocabulary scene control.
In addition to a global text prompt that describes the entire scene, the user provides a segmentation map.
We show its effectiveness on two state-of-the-art diffusion models: pixel-based and latent-conditional-based.
arXiv Detail & Related papers (2022-11-25T18:59:10Z) - NewsStories: Illustrating articles with visual summaries [49.924916589209374]
We introduce a large-scale multimodal dataset containing over 31M articles, 22M images and 1M videos.
We show that state-of-the-art image-text alignment methods are not robust to longer narratives with multiple images.
We introduce an intuitive baseline that outperforms these methods on zero-shot image-set retrieval by 10% on the GoodNews dataset.
arXiv Detail & Related papers (2022-07-26T17:34:11Z) - Multimodal Neural Machine Translation with Search Engine Based Image
Retrieval [4.662583832063716]
We propose an open-vocabulary image retrieval method to collect descriptive images for bilingual parallel corpus.
Our proposed method achieves significant improvements over strong baselines.
arXiv Detail & Related papers (2022-07-26T08:42:06Z) - TediGAN: Text-Guided Diverse Face Image Generation and Manipulation [52.83401421019309]
TediGAN is a framework for multi-modal image generation and manipulation with textual descriptions.
StyleGAN inversion module maps real images to the latent space of a well-trained StyleGAN.
visual-linguistic similarity learns the text-image matching by mapping the image and text into a common embedding space.
instance-level optimization is for identity preservation in manipulation.
arXiv Detail & Related papers (2020-12-06T16:20:19Z) - Text as Neural Operator: Image Manipulation by Text Instruction [68.53181621741632]
In this paper, we study a setting that allows users to edit an image with multiple objects using complex text instructions to add, remove, or change the objects.
The inputs of the task are multimodal including (1) a reference image and (2) an instruction in natural language that describes desired modifications to the image.
We show that the proposed model performs favorably against recent strong baselines on three public datasets.
arXiv Detail & Related papers (2020-08-11T07:07:10Z) - Satirical News Detection with Semantic Feature Extraction and
Game-theoretic Rough Sets [5.326582776477692]
We propose a semantic feature based approach to detect satirical news tweets.
Features are extracted by exploring inconsistencies in phrases, entities, and between main and relative clauses.
We apply game-theoretic rough set model to detect satirical news, in which probabilistic thresholds are derived by game equilibrium and repetition learning mechanism.
arXiv Detail & Related papers (2020-04-08T03:22:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.