Multimodal Analytics for Real-world News using Measures of Cross-modal
Entity Consistency
- URL: http://arxiv.org/abs/2003.10421v2
- Date: Fri, 23 Oct 2020 09:22:53 GMT
- Title: Multimodal Analytics for Real-world News using Measures of Cross-modal
Entity Consistency
- Authors: Eric M\"uller-Budack, Jonas Theiner, Sebastian Diering, Maximilian
Idahl, Ralph Ewerth
- Abstract summary: Multimodal information, e.g., enriching text with photos, is typically used to convey the news more effectively or to attract attention.
We introduce a novel task of cross-modal consistency verification in real-world news and present a multimodal approach to quantify the entity coherence between image and text.
- Score: 8.401772200450417
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The World Wide Web has become a popular source for gathering information and
news. Multimodal information, e.g., enriching text with photos, is typically
used to convey the news more effectively or to attract attention. Photo content
can range from decorative, depict additional important information, or can even
contain misleading information. Therefore, automatic approaches to quantify
cross-modal consistency of entity representation can support human assessors to
evaluate the overall multimodal message, for instance, with regard to bias or
sentiment. In some cases such measures could give hints to detect fake news,
which is an increasingly important topic in today's society. In this paper, we
introduce a novel task of cross-modal consistency verification in real-world
news and present a multimodal approach to quantify the entity coherence between
image and text. Named entity linking is applied to extract persons, locations,
and events from news texts. Several measures are suggested to calculate
cross-modal similarity for these entities using state of the art approaches. In
contrast to previous work, our system automatically gathers example data from
the Web and is applicable to real-world news. Results on two novel datasets
that cover different languages, topics, and domains demonstrate the feasibility
of our approach. Datasets and code are publicly available to foster research
towards this new direction.
Related papers
- Leveraging Entity Information for Cross-Modality Correlation Learning: The Entity-Guided Multimodal Summarization [49.08348604716746]
Multimodal Summarization with Multimodal Output (MSMO) aims to produce a multimodal summary that integrates both text and relevant images.
In this paper, we propose an Entity-Guided Multimodal Summarization model (EGMS)
Our model, building on BART, utilizes dual multimodal encoders with shared weights to process text-image and entity-image information concurrently.
arXiv Detail & Related papers (2024-08-06T12:45:56Z) - Multi-modal Stance Detection: New Datasets and Model [56.97470987479277]
We study multi-modal stance detection for tweets consisting of texts and images.
We propose a simple yet effective Targeted Multi-modal Prompt Tuning framework (TMPT)
TMPT achieves state-of-the-art performance in multi-modal stance detection.
arXiv Detail & Related papers (2024-02-22T05:24:19Z) - MSynFD: Multi-hop Syntax aware Fake News Detection [27.046529059563863]
Social media platforms have fueled the rapid dissemination of fake news, posing threats to our real-life society.
Existing methods use multimodal data or contextual information to enhance the detection of fake news.
We propose a novel multi-hop syntax aware fake news detection (MSynFD) method, which incorporates complementary syntax information to deal with subtle twists in fake news.
arXiv Detail & Related papers (2024-02-18T05:40:33Z) - Robust Domain Misinformation Detection via Multi-modal Feature Alignment [49.89164555394584]
We propose a robust domain and cross-modal approach for multi-modal misinformation detection.
It reduces the domain shift by aligning the joint distribution of textual and visual modalities.
We also propose a framework that simultaneously considers application scenarios of domain generalization.
arXiv Detail & Related papers (2023-11-24T07:06:16Z) - Interpretable Detection of Out-of-Context Misinformation with Neural-Symbolic-Enhanced Large Multimodal Model [16.348950072491697]
Misinformation creators now more tend to use out-of- multimedia contents to deceive the public and fake news detection systems.
This new type of misinformation increases the difficulty of not only detection but also clarification, because every individual modality is close enough to true information.
In this paper we explore how to achieve interpretable cross-modal de-contextualization detection that simultaneously identifies the mismatched pairs and the cross-modal contradictions.
arXiv Detail & Related papers (2023-04-15T21:11:55Z) - Modeling Entities as Semantic Points for Visual Information Extraction
in the Wild [55.91783742370978]
We propose an alternative approach to precisely and robustly extract key information from document images.
We explicitly model entities as semantic points, i.e., center points of entities are enriched with semantic information describing the attributes and relationships of different entities.
The proposed method can achieve significantly enhanced performance on entity labeling and linking, compared with previous state-of-the-art models.
arXiv Detail & Related papers (2023-03-23T08:21:16Z) - Multimodal Fake News Detection with Adaptive Unimodal Representation
Aggregation [28.564442206829625]
AURA is a multimodal fake news detection network with adaptive unimodal representation aggregation.
We perform coarse-level fake news detection and cross-modal cosistency learning according to the unimodal and multimodal representations.
Experiments on Weibo and Gossipcop prove that AURA can successfully beat several state-of-the-art FND schemes.
arXiv Detail & Related papers (2022-06-12T14:06:55Z) - Applying Automatic Text Summarization for Fake News Detection [4.2177790395417745]
The distribution of fake news is not a new but a rapidly growing problem.
We present an approach to the problem that combines the power of transformer-based language models.
Our framework, CMTR-BERT, combines multiple text representations and enables the incorporation of contextual information.
arXiv Detail & Related papers (2022-04-04T21:00:55Z) - Multi-modal Misinformation Detection: Approaches, Challenges and Opportunities [5.4482836906033585]
Social media platforms are evolving from text-based forums into multi-modal environments.
Misinformation spreaders have recently targeted contextual connections between the modalities e.g., text and image.
We analyze, categorize and identify existing approaches in addition to challenges and shortcomings they face in order to unearth new research opportunities in the field of multi-modal misinformation detection.
arXiv Detail & Related papers (2022-03-25T19:45:33Z) - Cross-Media Keyphrase Prediction: A Unified Framework with
Multi-Modality Multi-Head Attention and Image Wordings [63.79979145520512]
We explore the joint effects of texts and images in predicting the keyphrases for a multimedia post.
We propose a novel Multi-Modality Multi-Head Attention (M3H-Att) to capture the intricate cross-media interactions.
Our model significantly outperforms the previous state of the art based on traditional attention networks.
arXiv Detail & Related papers (2020-11-03T08:44:18Z) - Multimodal Categorization of Crisis Events in Social Media [81.07061295887172]
We present a new multimodal fusion method that leverages both images and texts as input.
In particular, we introduce a cross-attention module that can filter uninformative and misleading components from weak modalities.
We show that our method outperforms the unimodal approaches and strong multimodal baselines by a large margin on three crisis-related tasks.
arXiv Detail & Related papers (2020-04-10T06:31:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.