Related papers: Reliable Multi-Modal Object Re-Identification via Modality-Aware Graph Reasoning

Reliable Multi-Modal Object Re-Identification via Modality-Aware Graph Reasoning

URL: http://arxiv.org/abs/2504.14847v1
Date: Mon, 21 Apr 2025 03:58:40 GMT
Title: Reliable Multi-Modal Object Re-Identification via Modality-Aware Graph Reasoning
Authors: Xixi Wan, Aihua Zheng, Zi Wang, Bo Jiang, Jin Tang, Jixin Ma,
Abstract summary: We propose to leverage a novel graph reasoning model, termed the Modality-aware Graph Reasoning Network (MGRNet)<n>We first construct modality-aware graphs to enhance the extraction of fine-grained local details.<n>We then employ the selective graph nodes swap operation to alleviate the adverse effects of low-quality local features.
Score: 20.242422751083588
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multi-modal data provides abundant and diverse object information, crucial for effective modal interactions in Re-Identification (ReID) tasks. However, existing approaches often overlook the quality variations in local features and fail to fully leverage the complementary information across modalities, particularly in the case of low-quality features. In this paper, we propose to address this issue by leveraging a novel graph reasoning model, termed the Modality-aware Graph Reasoning Network (MGRNet). Specifically, we first construct modality-aware graphs to enhance the extraction of fine-grained local details by effectively capturing and modeling the relationships between patches. Subsequently, the selective graph nodes swap operation is employed to alleviate the adverse effects of low-quality local features by considering both local and global information, enhancing the representation of discriminative information. Finally, the swapped modality-aware graphs are fed into the local-aware graph reasoning module, which propagates multi-modal information to yield a reliable feature representation. Another advantage of the proposed graph reasoning approach is its ability to reconstruct missing modal information by exploiting inherent structural relationships, thereby minimizing disparities between different modalities. Experimental results on four benchmarks (RGBNT201, Market1501-MM, RGBNT100, MSVR310) indicate that the proposed method achieves state-of-the-art performance in multi-modal object ReID. The code for our method will be available upon acceptance.

Related papers

OWLEYE: Zero-Shot Learner for Cross-Domain Graph Data Anomaly Detection [48.77471686671269]
OWLEYE is a novel framework that learns transferable patterns of normal behavior from multiple graphs.<n>We show that OWLEYE achieves superior performance and generalizability compared to state-of-the-art baselines.
arXiv Detail & Related papers (2026-01-27T02:08:18Z)
LLHA-Net: A Hierarchical Attention Network for Two-View Correspondence Learning [33.76961965760301]
We propose a novel method called Layer-by-Layer Hierarchical Attention Network.<n>It enhances the precision of feature point matching in computer vision by addressing the issue of outliers.<n>Our method incorporates stage fusion, hierarchical extraction, and an attention mechanism to improve the network's representation capability.
arXiv Detail & Related papers (2025-12-31T04:25:53Z)
IGDMRec: Behavior Conditioned Item Graph Diffusion for Multimodal Recommendation [21.87097387902408]
Multimodal recommender systems (MRSs) are critical for various online platforms, offering users more accurate personalized recommendations by incorporating multimodal information.<n>We propose Item Graph Diffusion for Multimodal Recommendation (IGDMRec), a novel method that leverages a diffusion model with classifier-free guidance to denoise the semantic item graph.<n>Extensive experiments on four real-world datasets demonstrate the superiority of IGDMRec over competitive baselines.
arXiv Detail & Related papers (2025-12-23T02:13:01Z)
Multi-View Graph Feature Propagation for Privacy Preservation and Feature Sparsity [0.45835414225547183]
Graph Neural Networks (GNNs) have demonstrated remarkable success in node classification tasks over relational data.<n>Feature matrices are highly sparse or contain sensitive information, leading to degraded performance and increased privacy risks.<n>We propose a novel Multi-view Feature Propagation framework that enhances node classification under feature sparsity while promoting privacy preservation.
arXiv Detail & Related papers (2025-10-13T12:42:00Z)
Refining Contrastive Learning and Homography Relations for Multi-Modal Recommendation [19.01114538768217]
We propose a novel framework for textbfRtextbfEfining multi-modtextbfAl conttextbfRastive learning and hotextbfMography relations.<n>Our experiments on three real-world datasets demonstrate the superiority of REARM to various state-of-the-art baselines.
arXiv Detail & Related papers (2025-08-19T11:35:48Z)
UGG-ReID: Uncertainty-Guided Graph Model for Multi-Modal Object Re-Identification [26.770271366177603]
We propose a robust approach named Uncertainty-Guided Graph model for multi-modal object ReID (UGG-ReID)<n>UGG-ReID is designed to mitigate noise interference and facilitate effective multi-modal fusion.<n> Experimental results show that the proposed method achieves excellent performance on all datasets.
arXiv Detail & Related papers (2025-07-07T03:41:08Z)
NativE: Multi-modal Knowledge Graph Completion in the Wild [51.80447197290866]
We propose a comprehensive framework NativE to achieve MMKGC in the wild. NativE proposes a relation-guided dual adaptive fusion module that enables adaptive fusion for any modalities. We construct a new benchmark called WildKGC with five datasets to evaluate our method.
arXiv Detail & Related papers (2024-03-28T03:04:00Z)
DGNN: Decoupled Graph Neural Networks with Structural Consistency between Attribute and Graph Embedding Representations [62.04558318166396]
Graph neural networks (GNNs) demonstrate a robust capability for representation learning on graphs with complex structures. A novel GNNs framework, dubbed Decoupled Graph Neural Networks (DGNN), is introduced to obtain a more comprehensive embedding representation of nodes. Experimental results conducted on several graph benchmark datasets verify DGNN's superiority in node classification task.
arXiv Detail & Related papers (2024-01-28T06:43:13Z)
Learning Cross-modality Information Bottleneck Representation for Heterogeneous Person Re-Identification [61.49219876388174]
Visible-Infrared person re-identification (VI-ReID) is an important and challenging task in intelligent video surveillance. Existing methods mainly focus on learning a shared feature space to reduce the modality discrepancy between visible and infrared modalities. We present a novel mutual information and modality consensus network, namely CMInfoNet, to extract modality-invariant identity features.
arXiv Detail & Related papers (2023-08-29T06:55:42Z)
Multi-view Graph Convolutional Networks with Differentiable Node Selection [29.575611350389444]
We propose a framework dubbed Multi-view Graph Convolutional Network with Differentiable Node Selection (MGCN-DNS) MGCN-DNS accepts multi-channel graph-structural data as inputs and aims to learn more robust graph fusion through a differentiable neural network. The effectiveness of the proposed method is verified by rigorous comparisons with considerable state-of-the-art approaches.
arXiv Detail & Related papers (2022-12-09T21:48:36Z)
Augmenting Knowledge Transfer across Graphs [16.50013525404218]
We present TRANSNET, a generic learning framework for augmenting knowledge transfer across graphs. In particular, we introduce a novel notion named trinity signal that can naturally formulate various graph signals at different granularity. We show that TRANSNET outperforms all existing approaches on seven benchmark datasets by a significant margin.
arXiv Detail & Related papers (2022-12-09T08:46:02Z)
Towards Consistency and Complementarity: A Multiview Graph Information Bottleneck Approach [25.40829979251883]
How to model and integrate shared (i.e. consistency) and view-specific (i.e. complementarity) information is a key issue in multiview graph analysis. We propose a novel Multiview Variational Graph Information Bottleneck (MVGIB) principle to maximize the agreement for common representations and the disagreement for view-specific representations.
arXiv Detail & Related papers (2022-10-11T13:51:34Z)
Graph Neural Networks for Multi-Robot Active Information Acquisition [15.900385823366117]
A team of mobile robots, communicating through an underlying graph, estimates a hidden state expressing a phenomenon of interest. Existing approaches are either not scalable, unable to handle dynamic phenomena or not robust to changes in the communication graph. We propose an Information-aware Graph Block Network (I-GBNet) that aggregates information over the graph representation and provides sequential-decision making in a distributed manner.
arXiv Detail & Related papers (2022-09-24T21:45:06Z)
Adaptive Local-Component-aware Graph Convolutional Network for One-shot Skeleton-based Action Recognition [54.23513799338309]
We present an Adaptive Local-Component-aware Graph Convolutional Network for skeleton-based action recognition. Our method provides a stronger representation than the global embedding and helps our model reach state-of-the-art.
arXiv Detail & Related papers (2022-09-21T02:33:07Z)
CMTR: Cross-modality Transformer for Visible-infrared Person Re-identification [38.96033760300123]
Cross-modality transformer-based method (CMTR) for visible-infrared person re-identification task. We design the novel modality embeddings, which are fused with token embeddings to encode modalities' information. Our proposed CMTR model's performance significantly surpasses existing outstanding CNN-based methods.
arXiv Detail & Related papers (2021-10-18T03:12:59Z)
Adaptive Context-Aware Multi-Modal Network for Depth Completion [107.15344488719322]
We propose to adopt the graph propagation to capture the observed spatial contexts. We then apply the attention mechanism on the propagation, which encourages the network to model the contextual information adaptively. Finally, we introduce the symmetric gated fusion strategy to exploit the extracted multi-modal features effectively. Our model, named Adaptive Context-Aware Multi-Modal Network (ACMNet), achieves the state-of-the-art performance on two benchmarks.
arXiv Detail & Related papers (2020-08-25T06:00:06Z)
Graph Representation Learning via Graphical Mutual Information Maximization [86.32278001019854]
We propose a novel concept, Graphical Mutual Information (GMI), to measure the correlation between input graphs and high-level hidden representations. We develop an unsupervised learning model trained by maximizing GMI between the input and output of a graph neural encoder.
arXiv Detail & Related papers (2020-02-04T08:33:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.