MEG: Multi-Evidence GNN for Multimodal Semantic Forensics
- URL: http://arxiv.org/abs/2011.11286v1
- Date: Mon, 23 Nov 2020 09:01:28 GMT
- Title: MEG: Multi-Evidence GNN for Multimodal Semantic Forensics
- Authors: Ekraam Sabir, Ayush Jaiswal, Wael AbdAlmageed, Prem Natarajan
- Abstract summary: Fake news often involves semantic manipulations across modalities such as image, text, location etc.
Recent research has centered the problem around images, calling it image repurposing.
We introduce a novel graph neural network based model for multimodal semantic forensics.
- Score: 28.12652559292884
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fake news often involves semantic manipulations across modalities such as
image, text, location etc and requires the development of multimodal semantic
forensics for its detection. Recent research has centered the problem around
images, calling it image repurposing -- where a digitally unmanipulated image
is semantically misrepresented by means of its accompanying multimodal metadata
such as captions, location, etc. The image and metadata together comprise a
multimedia package. The problem setup requires algorithms to perform multimodal
semantic forensics to authenticate a query multimedia package using a reference
dataset of potentially related packages as evidences. Existing methods are
limited to using a single evidence (retrieved package), which ignores potential
performance improvement from the use of multiple evidences. In this work, we
introduce a novel graph neural network based model for multimodal semantic
forensics, which effectively utilizes multiple retrieved packages as evidences
and is scalable with the number of evidences. We compare the scalability and
performance of our model against existing methods. Experimental results show
that the proposed model outperforms existing state-of-the-art algorithms with
an error reduction of up to 25%.
Related papers
- Detecting Misinformation in Multimedia Content through Cross-Modal Entity Consistency: A Dual Learning Approach [10.376378437321437]
We propose a Multimedia Misinformation Detection framework for detecting misinformation from video content by leveraging cross-modal entity consistency.
Our results demonstrate that MultiMD outperforms state-of-the-art baseline models.
arXiv Detail & Related papers (2024-08-16T16:14:36Z) - Many-to-many Image Generation with Auto-regressive Diffusion Models [59.5041405824704]
This paper introduces a domain-general framework for many-to-many image generation, capable of producing interrelated image series from a given set of images.
We present MIS, a novel large-scale multi-image dataset, containing 12M synthetic multi-image samples, each with 25 interconnected images.
We learn M2M, an autoregressive model for many-to-many generation, where each image is modeled within a diffusion framework.
arXiv Detail & Related papers (2024-04-03T23:20:40Z) - Detecting and Grounding Multi-Modal Media Manipulation and Beyond [93.08116982163804]
We highlight a new research problem for multi-modal fake media, namely Detecting and Grounding Multi-Modal Media Manipulation (DGM4)
DGM4 aims to not only detect the authenticity of multi-modal media, but also ground the manipulated content.
We propose a novel HierArchical Multi-modal Manipulation rEasoning tRansformer (HAMMER) to fully capture the fine-grained interaction between different modalities.
arXiv Detail & Related papers (2023-09-25T15:05:46Z) - StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized
Image-Dialogue Data [129.92449761766025]
We propose a novel data collection methodology that synchronously synthesizes images and dialogues for visual instruction tuning.
This approach harnesses the power of generative models, marrying the abilities of ChatGPT and text-to-image generative models.
Our research includes comprehensive experiments conducted on various datasets.
arXiv Detail & Related papers (2023-08-20T12:43:52Z) - MMNet: Multi-Collaboration and Multi-Supervision Network for Sequential
Deepfake Detection [81.59191603867586]
Sequential deepfake detection aims to identify forged facial regions with the correct sequence for recovery.
The recovery of forged images requires knowledge of the manipulation model to implement inverse transformations.
We propose Multi-Collaboration and Multi-Supervision Network (MMNet) that handles various spatial scales and sequential permutations in forged face images.
arXiv Detail & Related papers (2023-07-06T02:32:08Z) - Detecting and Grounding Multi-Modal Media Manipulation [32.34908534582532]
We highlight a new research problem for multi-modal fake media, namely Detecting and Grounding Multi-Modal Media Manipulation (DGM4)
DGM4 aims to not only detect the authenticity of multi-modal media, but also ground the manipulated content.
We propose a novel HierArchical Multi-modal Manipulation rEasoning tRansformer (HAMMER) to fully capture the fine-grained interaction between different modalities.
arXiv Detail & Related papers (2023-04-05T16:20:40Z) - Named Entity and Relation Extraction with Multi-Modal Retrieval [51.660650522630526]
Multi-modal named entity recognition (NER) and relation extraction (RE) aim to leverage relevant image information to improve the performance of NER and RE.
We propose a novel Multi-modal Retrieval based framework (MoRe)
MoRe contains a text retrieval module and an image-based retrieval module, which retrieve related knowledge of the input text and image in the knowledge corpus respectively.
arXiv Detail & Related papers (2022-12-03T13:11:32Z) - Differentiable Meta Multigraph Search with Partial Message Propagation
on Heterogeneous Information Networks [18.104982772430102]
We propose a novel method called Partial Message Meta Multigraph search (PMMM) to automatically optimize the neural architecture design on Heterogeneous Information Networks (HINs)
PMMM adopts an efficient differentiable framework to search for a meaningful meta multigraph, which can capture more flexible and complex semantic relations than a meta graph.
Our approach outperforms the state-of-the-art heterogeneous GNNs, finds out meaningful meta multigraphs, and is significantly more stable.
arXiv Detail & Related papers (2022-11-27T07:35:42Z) - Multi-Modal Few-Shot Object Detection with Meta-Learning-Based
Cross-Modal Prompting [77.69172089359606]
We study multi-modal few-shot object detection (FSOD) in this paper, using both few-shot visual examples and class semantic information for detection.
Our approach is motivated by the high-level conceptual similarity of (metric-based) meta-learning and prompt-based learning.
We comprehensively evaluate the proposed multi-modal FSOD models on multiple few-shot object detection benchmarks, achieving promising results.
arXiv Detail & Related papers (2022-04-16T16:45:06Z) - Cross-Modality Sub-Image Retrieval using Contrastive Multimodal Image
Representations [3.3754780158324564]
Cross-modality image retrieval is challenging, since images of similar (or even the same) content captured by different modalities might share few common structures.
We propose a new application-independent content-based image retrieval system for reverse (sub-)image search across modalities.
arXiv Detail & Related papers (2022-01-10T19:04:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.