Neighbor-Based Feature and Index Enhancement for Person Re-Identification
- URL: http://arxiv.org/abs/2504.11798v1
- Date: Wed, 16 Apr 2025 06:13:20 GMT
- Title: Neighbor-Based Feature and Index Enhancement for Person Re-Identification
- Authors: Chao Yuan, Tianyi Zhang, Guanglin Niu,
- Abstract summary: Person re-identification (Re-ID) aims to match the same pedestrian in a large gallery with different cameras and views.<n>Existing methods usually improve feature representation by improving model architecture.<n>We propose a novel model DMON-ARO that leverages latent neighborhood information to enhance both feature representation and index performance.
- Score: 11.268034456071542
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Person re-identification (Re-ID) aims to match the same pedestrian in a large gallery with different cameras and views. Enhancing the robustness of the extracted feature representations is a main challenge in Re-ID. Existing methods usually improve feature representation by improving model architecture, but most methods ignore the potential contextual information, which limits the effectiveness of feature representation and retrieval performance. Neighborhood information, especially the potential information of multi-order neighborhoods, can effectively enrich feature expression and improve retrieval accuracy, but this has not been fully explored in existing research. Therefore, we propose a novel model DMON-ARO that leverages latent neighborhood information to enhance both feature representation and index performance. Our approach is built on two complementary modules: Dynamic Multi-Order Neighbor Modeling (DMON) and Asymmetric Relationship Optimization (ARO). The DMON module dynamically aggregates multi-order neighbor relationships, allowing it to capture richer contextual information and enhance feature representation through adaptive neighborhood modeling. Meanwhile, ARO refines the distance matrix by optimizing query-to-gallery relationships, improving the index accuracy. Extensive experiments on three benchmark datasets demonstrate that our approach achieves performance improvements against baseline models, which illustrate the effectiveness of our model. Specifically, our model demonstrates improvements in Rank-1 accuracy and mAP. Moreover, this method can also be directly extended to other re-identification tasks.
Related papers
- Unleashing the Potential of Two-Tower Models: Diffusion-Based Cross-Interaction for Large-Scale Matching [25.672699790866726]
Two-tower models are widely adopted in the industrial-scale matching stage across a broad range of application domains.
We propose a "cross-interaction decoupling architecture" within our matching paradigm.
arXiv Detail & Related papers (2025-02-28T03:40:37Z) - Can foundation models actively gather information in interactive environments to test hypotheses? [56.651636971591536]
We introduce a framework in which a model must determine the factors influencing a hidden reward function.<n>We investigate whether approaches such as self- throughput and increased inference time improve information gathering efficiency.
arXiv Detail & Related papers (2024-12-09T12:27:21Z) - YOLO-RD: Introducing Relevant and Compact Explicit Knowledge to YOLO by Retriever-Dictionary [12.39040757106137]
We introduce an innovative Retriever-Dictionary (RD) module to address this issue.<n>This architecture enables YOLO-based models to efficiently retrieve features from a Dictionary that contains the insight of the dataset.<n>Experiments show that using the RD significantly improves model performance, achieving more than a 3% increase in mean Average Precision for object detection.
arXiv Detail & Related papers (2024-10-20T09:38:58Z) - A Simple Background Augmentation Method for Object Detection with Diffusion Model [53.32935683257045]
In computer vision, it is well-known that a lack of data diversity will impair model performance.
We propose a simple yet effective data augmentation approach by leveraging advancements in generative models.
Background augmentation, in particular, significantly improves the models' robustness and generalization capabilities.
arXiv Detail & Related papers (2024-08-01T07:40:00Z) - VILLS -- Video-Image Learning to Learn Semantics for Person Re-Identification [51.89551385538251]
We propose VILLS (Video-Image Learning to Learn Semantics), a self-supervised method that jointly learns spatial and temporal features from images and videos.
VILLS first designs a local semantic extraction module that adaptively extracts semantically consistent and robust spatial features.
Then, VILLS designs a unified feature learning and adaptation module to represent image and video modalities in a consistent feature space.
arXiv Detail & Related papers (2023-11-27T19:30:30Z) - IRGen: Generative Modeling for Image Retrieval [82.62022344988993]
In this paper, we present a novel methodology, reframing image retrieval as a variant of generative modeling.
We develop our model, dubbed IRGen, to address the technical challenge of converting an image into a concise sequence of semantic units.
Our model achieves state-of-the-art performance on three widely-used image retrieval benchmarks and two million-scale datasets.
arXiv Detail & Related papers (2023-03-17T17:07:36Z) - SODAR: Segmenting Objects by DynamicallyAggregating Neighboring Mask
Representations [90.8752454643737]
Recent state-of-the-art one-stage instance segmentation model SOLO divides the input image into a grid and directly predicts per grid cell object masks with fully-convolutional networks.
We observe SOLO generates similar masks for an object at nearby grid cells, and these neighboring predictions can complement each other as some may better segment certain object part.
Motivated by the observed gap, we develop a novel learning-based aggregation method that improves upon SOLO by leveraging the rich neighboring information.
arXiv Detail & Related papers (2022-02-15T13:53:03Z) - MPI: Multi-receptive and Parallel Integration for Salient Object
Detection [17.32228882721628]
The semantic representation of deep features is essential for image context understanding.
In this paper, a novel method called MPI is proposed for salient object detection.
The proposed method outperforms state-of-the-art methods under different evaluation metrics.
arXiv Detail & Related papers (2021-08-08T12:01:44Z) - Auto-weighted Multi-view Feature Selection with Graph Optimization [90.26124046530319]
We propose a novel unsupervised multi-view feature selection model based on graph learning.
The contributions are threefold: (1) during the feature selection procedure, the consensus similarity graph shared by different views is learned.
Experiments on various datasets demonstrate the superiority of the proposed method compared with the state-of-the-art methods.
arXiv Detail & Related papers (2021-04-11T03:25:25Z) - Hyperparameter Optimization with Differentiable Metafeatures [5.586191108738563]
We propose a cross dataset surrogate model called Differentiable Metafeature-based Surrogate (DMFBS)
In contrast to existing models, DMFBS i) integrates a differentiable metafeature extractor and ii) is optimized using a novel multi-task loss.
We compare DMFBS against several recent models for HPO on three large meta-datasets and show that it consistently outperforms all of them with an average 10% improvement.
arXiv Detail & Related papers (2021-02-07T11:06:31Z) - Revealing the Invisible with Model and Data Shrinking for
Composite-database Micro-expression Recognition [49.463864096615254]
We analyze the influence of learning complexity, including the input complexity and model complexity.
We propose a recurrent convolutional network (RCN) to explore the shallower-architecture and lower-resolution input data.
We develop three parameter-free modules to integrate with RCN without increasing any learnable parameters.
arXiv Detail & Related papers (2020-06-17T06:19:24Z) - Multi-Person Pose Estimation with Enhanced Feature Aggregation and
Selection [33.15192824888279]
We propose a novel Enhanced Feature Aggregation and Selection network (EFASNet) for multi-person 2D human pose estimation.
Our method can well handle crowded, cluttered and occluded scenes.
Comprehensive experiments demonstrate that the proposed approach outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2020-03-20T08:33:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.