An Attention-Based Deep Learning Model for Multiple Pedestrian
Attributes Recognition
- URL: http://arxiv.org/abs/2004.01110v1
- Date: Thu, 2 Apr 2020 16:21:14 GMT
- Title: An Attention-Based Deep Learning Model for Multiple Pedestrian
Attributes Recognition
- Authors: Ehsan Yaghoubi, Diana Borza, Jo\~ao Neves, Aruna Kumar, Hugo
Proen\c{c}a
- Abstract summary: This paper provides a novel solution to the problem of automatic characterization of pedestrians in surveillance footage.
We propose a multi-task deep model that uses an element-wise multiplication layer to extract more comprehensive feature representations.
Our experiments were performed on two well-known datasets (RAP and PETA) and point for the superiority of the proposed method with respect to the state-of-the-art.
- Score: 4.6898263272139795
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The automatic characterization of pedestrians in surveillance footage is a
tough challenge, particularly when the data is extremely diverse with cluttered
backgrounds, and subjects are captured from varying distances, under multiple
poses, with partial occlusion. Having observed that the state-of-the-art
performance is still unsatisfactory, this paper provides a novel solution to
the problem, with two-fold contributions: 1) considering the strong semantic
correlation between the different full-body attributes, we propose a multi-task
deep model that uses an element-wise multiplication layer to extract more
comprehensive feature representations. In practice, this layer serves as a
filter to remove irrelevant background features, and is particularly important
to handle complex, cluttered data; and 2) we introduce a weighted-sum term to
the loss function that not only relativizes the contribution of each task (kind
of attributed) but also is crucial for performance improvement in
multiple-attribute inference settings. Our experiments were performed on two
well-known datasets (RAP and PETA) and point for the superiority of the
proposed method with respect to the state-of-the-art. The code is available at
https://github.com/Ehsan-Yaghoubi/MAN-PAR-.
Related papers
- Topological Persistence Guided Knowledge Distillation for Wearable Sensor Data [15.326571438985466]
topological features obtained by topological data analysis (TDA) have been suggested as a potential solution.
There are two significant obstacles to using topological features in deep learning.
We propose to use two teacher networks, one trained on the raw time-series data, and another trained on persistence images generated by TDA methods.
A robust student model is distilled, which uses only the time-series data as an input, while implicitly preserving topological features.
arXiv Detail & Related papers (2024-07-07T10:08:34Z) - Video Infringement Detection via Feature Disentanglement and Mutual
Information Maximization [51.206398602941405]
We propose to disentangle an original high-dimensional feature into multiple sub-features.
On top of the disentangled sub-features, we learn an auxiliary feature to enhance the sub-features.
Our method achieves 90.1% TOP-100 mAP on the large-scale SVD dataset and also sets the new state-of-the-art on the VCSL benchmark dataset.
arXiv Detail & Related papers (2023-09-13T10:53:12Z) - Exploring Fine-Grained Representation and Recomposition for Cloth-Changing Person Re-Identification [78.52704557647438]
We propose a novel FIne-grained Representation and Recomposition (FIRe$2$) framework to tackle both limitations without any auxiliary annotation or data.
Experiments demonstrate that FIRe$2$ can achieve state-of-the-art performance on five widely-used cloth-changing person Re-ID benchmarks.
arXiv Detail & Related papers (2023-08-21T12:59:48Z) - Object Segmentation by Mining Cross-Modal Semantics [68.88086621181628]
We propose a novel approach by mining the Cross-Modal Semantics to guide the fusion and decoding of multimodal features.
Specifically, we propose a novel network, termed XMSNet, consisting of (1) all-round attentive fusion (AF), (2) coarse-to-fine decoder (CFD), and (3) cross-layer self-supervision.
arXiv Detail & Related papers (2023-05-17T14:30:11Z) - Feature Completion Transformer for Occluded Person Re-identification [25.159974510754992]
Occluded person re-identification (Re-ID) is a challenging problem due to the destruction of occluders.
We propose a Feature Completion Transformer (FCFormer) to implicitly complement the semantic information of occluded parts in the feature space.
FCFormer achieves superior performance and outperforms the state-of-the-art methods by significant margins on occluded datasets.
arXiv Detail & Related papers (2023-03-03T01:12:57Z) - Progressively Dual Prior Guided Few-shot Semantic Segmentation [57.37506990980975]
Few-shot semantic segmentation task aims at performing segmentation in query images with a few annotated support samples.
We propose a progressively dual prior guided few-shot semantic segmentation network.
arXiv Detail & Related papers (2022-11-20T16:19:47Z) - The Overlooked Classifier in Human-Object Interaction Recognition [82.20671129356037]
We encode the semantic correlation among classes into the classification head by initializing the weights with language embeddings of HOIs.
We propose a new loss named LSE-Sign to enhance multi-label learning on a long-tailed dataset.
Our simple yet effective method enables detection-free HOI classification, outperforming the state-of-the-arts that require object detection and human pose by a clear margin.
arXiv Detail & Related papers (2022-03-10T23:35:00Z) - A Contrastive Distillation Approach for Incremental Semantic
Segmentation in Aerial Images [15.75291664088815]
A major issue concerning current deep neural architectures is known as catastrophic forgetting.
We propose a contrastive regularization, where any given input is compared with its augmented version.
We show the effectiveness of our solution on the Potsdam dataset, outperforming the incremental baseline in every test.
arXiv Detail & Related papers (2021-12-07T16:44:45Z) - One for All: An End-to-End Compact Solution for Hand Gesture Recognition [8.321276216978637]
This paper proposes an end-to-end compact CNN framework: fine grained feature attentive network for hand gesture recognition (Fit-Hand)
The pipeline of the proposed architecture consists of two main units: FineFeat module and dilated convolutional (Conv) layer.
The effectiveness of Fit-Hand is evaluated by using subject dependent (SD) and subject independent (SI) validation setup over seven benchmark datasets.
arXiv Detail & Related papers (2021-05-15T05:10:47Z) - Multi-scale Interactive Network for Salient Object Detection [91.43066633305662]
We propose the aggregate interaction modules to integrate the features from adjacent levels.
To obtain more efficient multi-scale features, the self-interaction modules are embedded in each decoder unit.
Experimental results on five benchmark datasets demonstrate that the proposed method without any post-processing performs favorably against 23 state-of-the-art approaches.
arXiv Detail & Related papers (2020-07-17T15:41:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.