Concentrated Multi-Grained Multi-Attention Network for Video Based
Person Re-Identification
- URL: http://arxiv.org/abs/2009.13019v1
- Date: Mon, 28 Sep 2020 02:18:06 GMT
- Title: Concentrated Multi-Grained Multi-Attention Network for Video Based
Person Re-Identification
- Authors: Panwen Hu, Jiazhen Liu and Rui Huang
- Abstract summary: Occlusion is still a severe problem in the video-based Re-IDentification (Re-ID) task.
We propose a Concentrated Multi-grained Multi-Attention Network (CMMANet)
- Score: 5.761429719197307
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Occlusion is still a severe problem in the video-based Re-IDentification
(Re-ID) task, which has a great impact on the success rate. The attention
mechanism has been proved to be helpful in solving the occlusion problem by a
large number of existing methods. However, their attention mechanisms still
lack the capability to extract sufficient discriminative information into the
final representations from the videos. The single attention module scheme
employed by existing methods cannot exploit multi-scale spatial cues, and the
attention of the single module will be dispersed by multiple salient parts of
the person. In this paper, we propose a Concentrated Multi-grained
Multi-Attention Network (CMMANet) where two multi-attention modules are
designed to extract multi-grained information through processing multi-scale
intermediate features. Furthermore, multiple attention submodules in each
multi-attention module can automatically discover multiple discriminative
regions of the video frames. To achieve this goal, we introduce a diversity
loss to diversify the submodules in each multi-attention module, and a
concentration loss to integrate their attention responses so that each
submodule can strongly focus on a specific meaningful part. The experimental
results show that the proposed approach outperforms the state-of-the-art
methods by large margins on multiple public datasets.
Related papers
- U3M: Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation [63.31007867379312]
We introduce U3M: An Unbiased Multiscale Modal Fusion Model for Multimodal Semantics.
We employ feature fusion at multiple scales to ensure the effective extraction and integration of both global and local features.
Experimental results demonstrate that our approach achieves superior performance across multiple datasets.
arXiv Detail & Related papers (2024-05-24T08:58:48Z) - AMFD: Distillation via Adaptive Multimodal Fusion for Multispectral Pedestrian Detection [23.91870504363899]
Double-stream networks in multispectral detection employ two separate feature extraction branches for multi-modal data.
This has hindered the widespread employment of multispectral pedestrian detection in embedded devices for autonomous systems.
We introduce the Adaptive Modal Fusion Distillation (AMFD) framework, which can fully utilize the original modal features of the teacher network.
arXiv Detail & Related papers (2024-05-21T17:17:17Z) - Multi-view Aggregation Network for Dichotomous Image Segmentation [76.75904424539543]
Dichotomous Image (DIS) has recently emerged towards high-precision object segmentation from high-resolution natural images.
Existing methods rely on tedious multiple encoder-decoder streams and stages to gradually complete the global localization and local refinement.
Inspired by it, we model DIS as a multi-view object perception problem and provide a parsimonious multi-view aggregation network (MVANet)
Experiments on the popular DIS-5K dataset show that our MVANet significantly outperforms state-of-the-art methods in both accuracy and speed.
arXiv Detail & Related papers (2024-04-11T03:00:00Z) - Towards Generalized Multi-stage Clustering: Multi-view Self-distillation [10.368796552760571]
Existing multi-stage clustering methods independently learn the salient features from multiple views and then perform the clustering task.
This paper proposes a novel multi-stage deep MVC framework where multi-view self-distillation (DistilMVC) is introduced to distill dark knowledge of label distribution.
arXiv Detail & Related papers (2023-10-29T03:35:34Z) - Video-based Cross-modal Auxiliary Network for Multimodal Sentiment
Analysis [16.930624128228658]
Video-based Cross-modal Auxiliary Network (VCAN) is proposed, which is comprised of an audio features map module and a cross-modal selection module.
VCAN is significantly superior to the state-of-the-art methods for improving the classification accuracy of multimodal sentiment analysis.
arXiv Detail & Related papers (2022-08-30T02:08:06Z) - Multimodal Multi-Head Convolutional Attention with Various Kernel Sizes
for Medical Image Super-Resolution [56.622832383316215]
We propose a novel multi-head convolutional attention module to super-resolve CT and MRI scans.
Our attention module uses the convolution operation to perform joint spatial-channel attention on multiple input tensors.
We introduce multiple attention heads, each head having a distinct receptive field size corresponding to a particular reduction rate for the spatial attention.
arXiv Detail & Related papers (2022-04-08T07:56:55Z) - Channel Exchanging Networks for Multimodal and Multitask Dense Image
Prediction [125.18248926508045]
We propose Channel-Exchanging-Network (CEN) which is self-adaptive, parameter-free, and more importantly, applicable for both multimodal fusion and multitask learning.
CEN dynamically exchanges channels betweenworks of different modalities.
For the application of dense image prediction, the validity of CEN is tested by four different scenarios.
arXiv Detail & Related papers (2021-12-04T05:47:54Z) - Unsupervised Person Re-Identification with Multi-Label Learning Guided
Self-Paced Clustering [48.31017226618255]
Unsupervised person re-identification (Re-ID) has drawn increasing research attention recently.
In this paper, we address the unsupervised person Re-ID with a conceptually novel yet simple framework, termed as Multi-label Learning guided self-paced Clustering (MLC)
MLC mainly learns discriminative features with three crucial modules, namely a multi-scale network, a multi-label learning module, and a self-paced clustering module.
arXiv Detail & Related papers (2021-03-08T07:30:13Z) - Feature Boosting, Suppression, and Diversification for Fine-Grained
Visual Classification [0.0]
Learning feature representation from discriminative local regions plays a key role in fine-grained visual classification.
We introduce two lightweight modules that can be easily plugged into existing convolutional neural networks.
Our method achieves state-of-the-art performances on several benchmark fine-grained datasets.
arXiv Detail & Related papers (2021-03-04T01:49:53Z) - Fine-Grained Visual Classification via Simultaneously Learning of
Multi-regional Multi-grained Features [15.71408474557042]
Fine-grained visual classification is a challenging task that recognizes the sub-classes belonging to the same meta-class.
In this paper, we argue that mining multi-regional multi-grained features is precisely the key to this task.
Experimental results over four widely used fine-grained image classification datasets demonstrate the effectiveness of the proposed method.
arXiv Detail & Related papers (2021-01-31T03:46:10Z) - Collaborative Attention Mechanism for Multi-View Action Recognition [75.33062629093054]
We propose a collaborative attention mechanism (CAM) for solving the multi-view action recognition problem.
The proposed CAM detects the attention differences among multi-view, and adaptively integrates frame-level information to benefit each other.
Experiments on four action datasets illustrate the proposed CAM achieves better results for each view and also boosts multi-view performance.
arXiv Detail & Related papers (2020-09-14T17:33:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.