Instance and Pair-Aware Dynamic Networks for Re-Identification
- URL: http://arxiv.org/abs/2103.05395v1
- Date: Tue, 9 Mar 2021 12:34:41 GMT
- Title: Instance and Pair-Aware Dynamic Networks for Re-Identification
- Authors: Bingliang Jiao and Xin Tan and Lu Yang and Yunlong Wang and Peng Wang
- Abstract summary: Re-identification (ReID) is to identify the same instance across different cameras.
We propose a novel end-to-end trainable dynamic convolution framework named Instance and Pair-Aware Dynamic Networks.
In some datasets our algorithm outperforms state-of-the-art methods and in others, our algorithm achieves a comparable performance.
- Score: 16.32740680438257
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Re-identification (ReID) is to identify the same instance across different
cameras. Existing ReID methods mostly utilize alignment-based or
attention-based strategies to generate effective feature representations.
However, most of these methods only extract general feature by employing single
input image itself, overlooking the exploration of relevance between comparing
images. To fill this gap, we propose a novel end-to-end trainable dynamic
convolution framework named Instance and Pair-Aware Dynamic Networks in this
paper. The proposed model is composed of three main branches where a
self-guided dynamic branch is constructed to strengthen instance-specific
features, focusing on every single image. Furthermore, we also design a
mutual-guided dynamic branch to generate pair-aware features for each pair of
images to be compared. Extensive experiments are conducted in order to verify
the effectiveness of our proposed algorithm. We evaluate our algorithm in
several mainstream person and vehicle ReID datasets including CUHK03,
DukeMTMCreID, Market-1501, VeRi776 and VehicleID. In some datasets our
algorithm outperforms state-of-the-art methods and in others, our algorithm
achieves a comparable performance.
Related papers
- Mismatched: Evaluating the Limits of Image Matching Approaches and Benchmarks [9.388897214344572]
Three-dimensional (3D) reconstruction from two-dimensional images is an active research field in computer vision.
Traditionally, parametric techniques have been employed for this task.
Recent advancements have seen a shift towards learning-based methods.
arXiv Detail & Related papers (2024-08-29T11:16:34Z) - Ensemble Quadratic Assignment Network for Graph Matching [52.20001802006391]
Graph matching is a commonly used technique in computer vision and pattern recognition.
Recent data-driven approaches have improved the graph matching accuracy remarkably.
We propose a graph neural network (GNN) based approach to combine the advantages of data-driven and traditional methods.
arXiv Detail & Related papers (2024-03-11T06:34:05Z) - Dynamic Visual Semantic Sub-Embeddings and Fast Re-Ranking [0.5242869847419834]
We propose a Dynamic Visual Semantic Sub-Embeddings framework (DVSE) to reduce the information entropy.
To encourage the generated candidate embeddings to capture various semantic variations, we construct a mixed distribution.
We compare the performance with existing set-based method using four image feature encoders and two text feature encoders on three benchmark datasets.
arXiv Detail & Related papers (2023-09-15T04:39:11Z) - Learning Image Deraining Transformer Network with Dynamic Dual
Self-Attention [46.11162082219387]
This paper proposes an effective image deraining Transformer with dynamic dual self-attention (DDSA)
Specifically, we only select the most useful similarity values based on top-k approximate calculation to achieve sparse attention.
In addition, we also develop a novel spatial-enhanced feed-forward network (SEFN) to further obtain a more accurate representation for achieving high-quality derained results.
arXiv Detail & Related papers (2023-08-15T13:59:47Z) - Feature Decoupling-Recycling Network for Fast Interactive Segmentation [79.22497777645806]
Recent interactive segmentation methods iteratively take source image, user guidance and previously predicted mask as the input.
We propose the Feature Decoupling-Recycling Network (FDRN), which decouples the modeling components based on their intrinsic discrepancies.
arXiv Detail & Related papers (2023-08-07T12:26:34Z) - Unifying Flow, Stereo and Depth Estimation [121.54066319299261]
We present a unified formulation and model for three motion and 3D perception tasks.
We formulate all three tasks as a unified dense correspondence matching problem.
Our model naturally enables cross-task transfer since the model architecture and parameters are shared across tasks.
arXiv Detail & Related papers (2022-11-10T18:59:54Z) - Reuse your features: unifying retrieval and feature-metric alignment [3.845387441054033]
DRAN is the first network able to produce the features for the three steps of visual localization.
It achieves competitive performance in terms of robustness and accuracy under challenging conditions in public benchmarks.
arXiv Detail & Related papers (2022-04-13T10:42:00Z) - Fusing Local Similarities for Retrieval-based 3D Orientation Estimation
of Unseen Objects [70.49392581592089]
We tackle the task of estimating the 3D orientation of previously-unseen objects from monocular images.
We follow a retrieval-based strategy and prevent the network from learning object-specific features.
Our experiments on the LineMOD, LineMOD-Occluded, and T-LESS datasets show that our method yields a significantly better generalization to unseen objects than previous works.
arXiv Detail & Related papers (2022-03-16T08:53:00Z) - Contextual Similarity Aggregation with Self-attention for Visual
Re-ranking [96.55393026011811]
We propose a visual re-ranking method by contextual similarity aggregation with self-attention.
We conduct comprehensive experiments on four benchmark datasets to demonstrate the generality and effectiveness of our proposed visual re-ranking method.
arXiv Detail & Related papers (2021-10-26T06:20:31Z) - Summarize and Search: Learning Consensus-aware Dynamic Convolution for
Co-Saliency Detection [139.10628924049476]
Humans perform co-saliency detection by first summarizing the consensus knowledge in the whole group and then searching corresponding objects in each image.
Previous methods usually lack robustness, scalability, or stability for the first process and simply fuse consensus features with image features for the second process.
We propose a novel consensus-aware dynamic convolution model to explicitly and effectively perform the "summarize and search" process.
arXiv Detail & Related papers (2021-10-01T12:06:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.