Deep Template Matching for Pedestrian Attribute Recognition with the
Auxiliary Supervision of Attribute-wise Keypoints
- URL: http://arxiv.org/abs/2011.06798v1
- Date: Fri, 13 Nov 2020 07:52:26 GMT
- Title: Deep Template Matching for Pedestrian Attribute Recognition with the
Auxiliary Supervision of Attribute-wise Keypoints
- Authors: Jiajun Zhang, Pengyuan Ren and Jianmin Li
- Abstract summary: Pedestrian Attribute Recognition (PAR) has aroused extensive attention due to its important role in video surveillance scenarios.
Recent works design complicated modules, e.g., attention mechanism and proposal of body parts to localize the attribute corresponding region.
These works prove that localization of attribute specific regions precisely will help in improving performance.
However, these part-information-based methods are still not accurate as well as increasing model complexity.
- Score: 33.35677385823819
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pedestrian Attribute Recognition (PAR) has aroused extensive attention due to
its important role in video surveillance scenarios. In most cases, the
existence of a particular attribute is strongly related to a partial region.
Recent works design complicated modules, e.g., attention mechanism and proposal
of body parts to localize the attribute corresponding region. These works
further prove that localization of attribute specific regions precisely will
help in improving performance. However, these part-information-based methods
are still not accurate as well as increasing model complexity which makes it
hard to deploy on realistic applications. In this paper, we propose a Deep
Template Matching based method to capture body parts features with less
computation. Further, we also proposed an auxiliary supervision method that use
human pose keypoints to guide the learning toward discriminative local cues.
Extensive experiments show that the proposed method outperforms and has lower
computational complexity, compared with the state-of-the-art approaches on
large-scale pedestrian attribute datasets, including PETA, PA-100K, RAP, and
RAPv2 zs.
Related papers
- SSPNet: Scale and Spatial Priors Guided Generalizable and Interpretable
Pedestrian Attribute Recognition [23.55622798950833]
A novel Scale and Spatial Priors Guided Network (SSPNet) is proposed for Pedestrian Attribute Recognition (PAR) models.
SSPNet learns to provide reasonable scale prior information for different attribute groups, allowing the model to focus on different levels of feature maps.
A novel IoU based attribute localization metric is proposed for Weakly-supervised Pedestrian Attribute localization (WPAL) based on the improved Grad-CAM for attribute response mask.
arXiv Detail & Related papers (2023-12-11T00:41:40Z) - DETR Doesn't Need Multi-Scale or Locality Design [69.56292005230185]
This paper presents an improved DETR detector that maintains a "plain" nature.
It uses a single-scale feature map and global cross-attention calculations without specific locality constraints.
We show that two simple technologies are surprisingly effective within a plain design to compensate for the lack of multi-scale feature maps and locality constraints.
arXiv Detail & Related papers (2023-08-03T17:59:04Z) - A Solution to Co-occurrence Bias: Attributes Disentanglement via Mutual
Information Minimization for Pedestrian Attribute Recognition [10.821982414387525]
We show that current methods can actually suffer in generalizing such fitted attributes interdependencies onto scenes or identities off the dataset distribution.
To render models robust in realistic scenes, we propose the attributes-disentangled feature learning to ensure the recognition of an attribute not inferring on the existence of others.
arXiv Detail & Related papers (2023-07-28T01:34:55Z) - Adaptive Local-Component-aware Graph Convolutional Network for One-shot
Skeleton-based Action Recognition [54.23513799338309]
We present an Adaptive Local-Component-aware Graph Convolutional Network for skeleton-based action recognition.
Our method provides a stronger representation than the global embedding and helps our model reach state-of-the-art.
arXiv Detail & Related papers (2022-09-21T02:33:07Z) - UPAR: Unified Pedestrian Attribute Recognition and Person Retrieval [4.6193503399184275]
We present UPAR, the Unified Person Attribute Recognition dataset.
It is based on four well-known person attribute recognition datasets: PA100k, PETA, RAPv2, and Market1501.
We unify those datasets by providing 3,3M additional annotations to harmonize 40 important binary attributes over 12 attribute categories.
arXiv Detail & Related papers (2022-09-06T14:20:56Z) - TransFA: Transformer-based Representation for Face Attribute Evaluation [87.09529826340304]
We propose a novel textbftransformer-based representation for textbfattribute evaluation method (textbfTransFA)
The proposed TransFA achieves superior performances compared with state-of-the-art methods.
arXiv Detail & Related papers (2022-07-12T10:58:06Z) - CHALLENGER: Training with Attribution Maps [63.736435657236505]
We show that utilizing attribution maps for training neural networks can improve regularization of models and thus increase performance.
In particular, we show that our generic domain-independent approach yields state-of-the-art results in vision, natural language processing and on time series tasks.
arXiv Detail & Related papers (2022-05-30T13:34:46Z) - Pedestrian Attribute Recognition in Video Surveillance Scenarios Based
on View-attribute Attention Localization [8.807717261983539]
We propose a novel view-attribute localization method based on attention (VALA)
A specific view-attribute is composed by the extracted attribute feature and four view scores which are predicted by view predictor as the confidences for attribute from different views.
Experiments on three wide datasets (RAP, RAPv2, PETA, and PA-100K) demonstrate the effectiveness of our approach compared with state-of-the-art methods.
arXiv Detail & Related papers (2021-06-11T16:09:31Z) - Transforming Feature Space to Interpret Machine Learning Models [91.62936410696409]
This contribution proposes a novel approach that interprets machine-learning models through the lens of feature space transformations.
It can be used to enhance unconditional as well as conditional post-hoc diagnostic tools.
A case study on remote-sensing landcover classification with 46 features is used to demonstrate the potential of the proposed approach.
arXiv Detail & Related papers (2021-04-09T10:48:11Z) - Simple and effective localized attribute representations for zero-shot
learning [48.053204004771665]
Zero-shot learning (ZSL) aims to discriminate images from unseen classes by exploiting relations to seen classes via their semantic descriptions.
We propose localizing representations in the semantic/attribute space, with a simple but effective pipeline where localization is implicit.
Our method can be implemented easily, which can be used as a new baseline for zero shot-learning.
arXiv Detail & Related papers (2020-06-10T16:46:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.