MEG: Multi-Evidence GNN for Multimodal Semantic Forensics
- URL: http://arxiv.org/abs/2011.11286v1
- Date: Mon, 23 Nov 2020 09:01:28 GMT
- Title: MEG: Multi-Evidence GNN for Multimodal Semantic Forensics
- Authors: Ekraam Sabir, Ayush Jaiswal, Wael AbdAlmageed, Prem Natarajan
- Abstract summary: Fake news often involves semantic manipulations across modalities such as image, text, location etc.
Recent research has centered the problem around images, calling it image repurposing.
We introduce a novel graph neural network based model for multimodal semantic forensics.
- Score: 28.12652559292884
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fake news often involves semantic manipulations across modalities such as
image, text, location etc and requires the development of multimodal semantic
forensics for its detection. Recent research has centered the problem around
images, calling it image repurposing -- where a digitally unmanipulated image
is semantically misrepresented by means of its accompanying multimodal metadata
such as captions, location, etc. The image and metadata together comprise a
multimedia package. The problem setup requires algorithms to perform multimodal
semantic forensics to authenticate a query multimedia package using a reference
dataset of potentially related packages as evidences. Existing methods are
limited to using a single evidence (retrieved package), which ignores potential
performance improvement from the use of multiple evidences. In this work, we
introduce a novel graph neural network based model for multimodal semantic
forensics, which effectively utilizes multiple retrieved packages as evidences
and is scalable with the number of evidences. We compare the scalability and
performance of our model against existing methods. Experimental results show
that the proposed model outperforms existing state-of-the-art algorithms with
an error reduction of up to 25%.
Related papers
- Towards Text-Image Interleaved Retrieval [49.96332254241075]
We introduce the text-image interleaved retrieval (TIIR) task, where the query and document are interleaved text-image sequences.
We construct a TIIR benchmark based on naturally interleaved wikiHow tutorials, where a specific pipeline is designed to generate interleaved queries.
We propose a novel Matryoshka Multimodal Embedder (MME), which compresses the number of visual tokens at different granularity.
arXiv Detail & Related papers (2025-02-18T12:00:47Z) - MIFNet: Learning Modality-Invariant Features for Generalizable Multimodal Image Matching [54.740256498985026]
Keypoint detection and description methods often struggle with multimodal data.
We propose a modality-invariant feature learning network (MIFNet) to compute modality-invariant features for keypoint descriptions in multimodal image matching.
arXiv Detail & Related papers (2025-01-20T06:56:30Z) - ForgerySleuth: Empowering Multimodal Large Language Models for Image Manipulation Detection [107.86009509291581]
We propose ForgerySleuth to perform comprehensive clue fusion and generate segmentation outputs indicating regions that are tampered with.
Our experiments demonstrate the effectiveness of ForgeryAnalysis and show that ForgerySleuth significantly outperforms existing methods in robustness, generalization, and explainability.
arXiv Detail & Related papers (2024-11-29T04:35:18Z) - Detecting Misinformation in Multimedia Content through Cross-Modal Entity Consistency: A Dual Learning Approach [10.376378437321437]
We propose a Multimedia Misinformation Detection framework for detecting misinformation from video content by leveraging cross-modal entity consistency.
Our results demonstrate that MultiMD outperforms state-of-the-art baseline models.
arXiv Detail & Related papers (2024-08-16T16:14:36Z) - Many-to-many Image Generation with Auto-regressive Diffusion Models [59.5041405824704]
This paper introduces a domain-general framework for many-to-many image generation, capable of producing interrelated image series from a given set of images.
We present MIS, a novel large-scale multi-image dataset, containing 12M synthetic multi-image samples, each with 25 interconnected images.
We learn M2M, an autoregressive model for many-to-many generation, where each image is modeled within a diffusion framework.
arXiv Detail & Related papers (2024-04-03T23:20:40Z) - Detecting and Grounding Multi-Modal Media Manipulation and Beyond [93.08116982163804]
We highlight a new research problem for multi-modal fake media, namely Detecting and Grounding Multi-Modal Media Manipulation (DGM4)
DGM4 aims to not only detect the authenticity of multi-modal media, but also ground the manipulated content.
We propose a novel HierArchical Multi-modal Manipulation rEasoning tRansformer (HAMMER) to fully capture the fine-grained interaction between different modalities.
arXiv Detail & Related papers (2023-09-25T15:05:46Z) - Detecting and Grounding Multi-Modal Media Manipulation [32.34908534582532]
We highlight a new research problem for multi-modal fake media, namely Detecting and Grounding Multi-Modal Media Manipulation (DGM4)
DGM4 aims to not only detect the authenticity of multi-modal media, but also ground the manipulated content.
We propose a novel HierArchical Multi-modal Manipulation rEasoning tRansformer (HAMMER) to fully capture the fine-grained interaction between different modalities.
arXiv Detail & Related papers (2023-04-05T16:20:40Z) - Differentiable Meta Multigraph Search with Partial Message Propagation
on Heterogeneous Information Networks [18.104982772430102]
We propose a novel method called Partial Message Meta Multigraph search (PMMM) to automatically optimize the neural architecture design on Heterogeneous Information Networks (HINs)
PMMM adopts an efficient differentiable framework to search for a meaningful meta multigraph, which can capture more flexible and complex semantic relations than a meta graph.
Our approach outperforms the state-of-the-art heterogeneous GNNs, finds out meaningful meta multigraphs, and is significantly more stable.
arXiv Detail & Related papers (2022-11-27T07:35:42Z) - Multi-Modal Few-Shot Object Detection with Meta-Learning-Based
Cross-Modal Prompting [77.69172089359606]
We study multi-modal few-shot object detection (FSOD) in this paper, using both few-shot visual examples and class semantic information for detection.
Our approach is motivated by the high-level conceptual similarity of (metric-based) meta-learning and prompt-based learning.
We comprehensively evaluate the proposed multi-modal FSOD models on multiple few-shot object detection benchmarks, achieving promising results.
arXiv Detail & Related papers (2022-04-16T16:45:06Z) - Cross-Modality Sub-Image Retrieval using Contrastive Multimodal Image
Representations [3.3754780158324564]
Cross-modality image retrieval is challenging, since images of similar (or even the same) content captured by different modalities might share few common structures.
We propose a new application-independent content-based image retrieval system for reverse (sub-)image search across modalities.
arXiv Detail & Related papers (2022-01-10T19:04:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.