IAUnet: Global Context-Aware Feature Learning for Person
Re-Identification
- URL: http://arxiv.org/abs/2009.01035v1
- Date: Wed, 2 Sep 2020 13:07:10 GMT
- Title: IAUnet: Global Context-Aware Feature Learning for Person
Re-Identification
- Authors: Ruibing Hou and Bingpeng Ma and Hong Chang and Xinqian Gu and Shiguang
Shan and Xilin Chen
- Abstract summary: IAU block enables the feature to incorporate the globally spatial, temporal, and channel context.
It is lightweight, end-to-end trainable, and can be easily plugged into existing CNNs to form IAUnet.
Experiments show that IAUnet performs favorably against state-of-the-art on both image and video reID tasks.
- Score: 106.50534744965955
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Person re-identification (reID) by CNNs based networks has achieved favorable
performance in recent years. However, most of existing CNNs based methods do
not take full advantage of spatial-temporal context modeling. In fact, the
global spatial-temporal context can greatly clarify local distractions to
enhance the target feature representation. To comprehensively leverage the
spatial-temporal context information, in this work, we present a novel block,
Interaction-Aggregation-Update (IAU), for high-performance person reID.
Firstly, Spatial-Temporal IAU (STIAU) module is introduced. STIAU jointly
incorporates two types of contextual interactions into a CNN framework for
target feature learning. Here the spatial interactions learn to compute the
contextual dependencies between different body parts of a single frame. While
the temporal interactions are used to capture the contextual dependencies
between the same body parts across all frames. Furthermore, a Channel IAU
(CIAU) module is designed to model the semantic contextual interactions between
channel features to enhance the feature representation, especially for
small-scale visual cues and body parts. Therefore, the IAU block enables the
feature to incorporate the globally spatial, temporal, and channel context. It
is lightweight, end-to-end trainable, and can be easily plugged into existing
CNNs to form IAUnet. The experiments show that IAUnet performs favorably
against state-of-the-art on both image and video reID tasks and achieves
compelling results on a general object categorization task. The source code is
available at https://github.com/blue-blue272/ImgReID-IAnet.
Related papers
- Keypoint-Augmented Self-Supervised Learning for Medical Image
Segmentation with Limited Annotation [21.203307064937142]
We present a keypointaugmented fusion layer that extracts representations preserving both short- and long-range self-attention.
In particular, we augment the CNN feature map at multiple scales by incorporating an additional input that learns long-range spatial selfattention.
Our method further outperforms existing SSL methods by producing more robust self-attention.
arXiv Detail & Related papers (2023-10-02T22:31:30Z) - Deeply-Coupled Convolution-Transformer with Spatial-temporal
Complementary Learning for Video-based Person Re-identification [91.56939957189505]
We propose a novel spatial-temporal complementary learning framework named Deeply-Coupled Convolution-Transformer (DCCT) for high-performance video-based person Re-ID.
Our framework could attain better performances than most state-of-the-art methods.
arXiv Detail & Related papers (2023-04-27T12:16:44Z) - Abstract Flow for Temporal Semantic Segmentation on the Permutohedral
Lattice [27.37701107719647]
We extend a backbone LatticeNet to process temporal point cloud data.
We propose a new module called Abstract Flow which allows the network to match parts of the scene with similar abstract features.
We obtain state-of-the-art results on the Semantic KITTI dataset that contains LiDAR scans from real urban environments.
arXiv Detail & Related papers (2022-03-29T12:14:31Z) - SSAN: Separable Self-Attention Network for Video Representation Learning [11.542048296046524]
We propose a separable self-attention (SSA) module, which models spatial and temporal correlations sequentially.
By adding SSA module into 2D CNN, we build a SSA network (SSAN) for video representation learning.
Our approach outperforms state-of-the-art methods on Something-Something and Kinetics-400 datasets.
arXiv Detail & Related papers (2021-05-27T10:02:04Z) - Dense Interaction Learning for Video-based Person Re-identification [75.03200492219003]
We propose a hybrid framework, Dense Interaction Learning (DenseIL), to tackle video-based person re-ID difficulties.
DenseIL contains a CNN encoder and a Dense Interaction (DI) decoder.
Our experiments consistently and significantly outperform all the state-of-the-art methods on multiple standard video-based re-ID datasets.
arXiv Detail & Related papers (2021-03-16T12:22:08Z) - The Mind's Eye: Visualizing Class-Agnostic Features of CNNs [92.39082696657874]
We propose an approach to visually interpret CNN features given a set of images by creating corresponding images that depict the most informative features of a specific layer.
Our method uses a dual-objective activation and distance loss, without requiring a generator network nor modifications to the original model.
arXiv Detail & Related papers (2021-01-29T07:46:39Z) - AXM-Net: Cross-Modal Context Sharing Attention Network for Person Re-ID [20.700750237972155]
Cross-modal person re-identification (Re-ID) is critical for modern video surveillance systems.
Key challenge is to align inter-modality representations according to semantic information present for a person and ignore background information.
We present AXM-Net, a novel CNN based architecture designed for learning semantically aligned visual and textual representations.
arXiv Detail & Related papers (2021-01-19T16:06:39Z) - Temporal Attribute-Appearance Learning Network for Video-based Person
Re-Identification [94.03477970865772]
We propose a novel Temporal Attribute-Appearance Learning Network (TALNet) for video-based person re-identification.
TALNet exploits human attributes and appearance to learn comprehensive and effective pedestrian representations from videos.
arXiv Detail & Related papers (2020-09-09T09:28:07Z) - Co-Saliency Spatio-Temporal Interaction Network for Person
Re-Identification in Videos [85.6430597108455]
We propose a novel Co-Saliency Spatio-Temporal Interaction Network (CSTNet) for person re-identification in videos.
It captures the common salient foreground regions among video frames and explores the spatial-temporal long-range context interdependency from such regions.
Multiple spatialtemporal interaction modules within CSTNet are proposed, which exploit the spatial and temporal long-range context interdependencies on such features and spatial-temporal information correlation.
arXiv Detail & Related papers (2020-04-10T10:23:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.