Spatial and Semantic Consistency Regularizations for Pedestrian
Attribute Recognition
- URL: http://arxiv.org/abs/2109.05686v1
- Date: Mon, 13 Sep 2021 03:36:44 GMT
- Title: Spatial and Semantic Consistency Regularizations for Pedestrian
Attribute Recognition
- Authors: Jian Jia and Xiaotang Chen and Kaiqi Huang
- Abstract summary: We propose a framework that consists of two complementary regularizations to achieve spatial and semantic consistency for each attribute.
Based on the precise attribute locations, we propose a semantic consistency regularization to extract intrinsic and discriminative semantic features.
Results show that the proposed method performs favorably against state-of-the-art methods without increasing parameters.
- Score: 50.932864767867365
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While recent studies on pedestrian attribute recognition have shown
remarkable progress in leveraging complicated networks and attention
mechanisms, most of them neglect the inter-image relations and an important
prior: spatial consistency and semantic consistency of attributes under
surveillance scenarios. The spatial locations of the same attribute should be
consistent between different pedestrian images, \eg, the ``hat" attribute and
the ``boots" attribute are always located at the top and bottom of the picture
respectively. In addition, the inherent semantic feature of the ``hat"
attribute should be consistent, whether it is a baseball cap, beret, or helmet.
To fully exploit inter-image relations and aggregate human prior in the model
learning process, we construct a Spatial and Semantic Consistency (SSC)
framework that consists of two complementary regularizations to achieve spatial
and semantic consistency for each attribute. Specifically, we first propose a
spatial consistency regularization to focus on reliable and stable
attribute-related regions. Based on the precise attribute locations, we further
propose a semantic consistency regularization to extract intrinsic and
discriminative semantic features. We conduct extensive experiments on popular
benchmarks including PA100K, RAP, and PETA. Results show that the proposed
method performs favorably against state-of-the-art methods without increasing
parameters.
Related papers
- Dual Relation Mining Network for Zero-Shot Learning [48.89161627050706]
We propose a Dual Relation Mining Network (DRMN) to enable effective visual-semantic interactions and learn semantic relationship among attributes for knowledge transfer.
Specifically, we introduce a Dual Attention Block (DAB) for visual-semantic relationship mining, which enriches visual information by multi-level feature fusion.
For semantic relationship modeling, we utilize a Semantic Interaction Transformer (SIT) to enhance the generalization of attribute representations among images.
arXiv Detail & Related papers (2024-05-06T16:31:19Z) - Progressive Feature Self-reinforcement for Weakly Supervised Semantic
Segmentation [55.69128107473125]
We propose a single-stage approach for Weakly Supervised Semantic (WSSS) with image-level labels.
We adaptively partition the image content into deterministic regions (e.g., confident foreground and background) and uncertain regions (e.g., object boundaries and misclassified categories) for separate processing.
Building upon this, we introduce a complementary self-enhancement method that constrains the semantic consistency between these confident regions and an augmented image with the same class labels.
arXiv Detail & Related papers (2023-12-14T13:21:52Z) - Semantics Meets Temporal Correspondence: Self-supervised Object-centric Learning in Videos [63.94040814459116]
Self-supervised methods have shown remarkable progress in learning high-level semantics and low-level temporal correspondence.
We propose a novel semantic-aware masked slot attention on top of the fused semantic features and correspondence maps.
We adopt semantic- and instance-level temporal consistency as self-supervision to encourage temporally coherent object-centric representations.
arXiv Detail & Related papers (2023-08-19T09:12:13Z) - A Solution to Co-occurrence Bias: Attributes Disentanglement via Mutual
Information Minimization for Pedestrian Attribute Recognition [10.821982414387525]
We show that current methods can actually suffer in generalizing such fitted attributes interdependencies onto scenes or identities off the dataset distribution.
To render models robust in realistic scenes, we propose the attributes-disentangled feature learning to ensure the recognition of an attribute not inferring on the existence of others.
arXiv Detail & Related papers (2023-07-28T01:34:55Z) - Leveraging Hidden Positives for Unsupervised Semantic Segmentation [5.937673383513695]
We leverage contrastive learning by excavating hidden positives to learn rich semantic relationships.
We introduce a gradient propagation strategy to learn semantic consistency between adjacent patches.
Our proposed method achieves new state-of-the-art (SOTA) results in COCO-stuff, Cityscapes, and Potsdam-3 datasets.
arXiv Detail & Related papers (2023-03-27T08:57:28Z) - Calibrated Feature Decomposition for Generalizable Person
Re-Identification [82.64133819313186]
Calibrated Feature Decomposition (CFD) module focuses on improving the generalization capacity for person re-identification.
A calibrated-and-standardized Batch normalization (CSBN) is designed to learn calibrated person representation.
arXiv Detail & Related papers (2021-11-27T17:12:43Z) - Matched sample selection with GANs for mitigating attribute confounding [30.488267816304177]
We propose a matching approach that selects a subset of images from the full dataset with balanced attribute distributions across protected attributes.
Our matching approach first projects real images onto a generative network's latent space in a manner that preserves semantic attributes.
It then finds adversarial matches in this latent space across a chosen protected attribute, yielding a dataset where semantic and perceptual attributes are balanced across the protected attribute.
arXiv Detail & Related papers (2021-03-24T19:18:44Z) - Unsupervised segmentation via semantic-apparent feature fusion [21.75371777263847]
This research proposes an unsupervised foreground segmentation method based on semantic-apparent feature fusion (SAFF)
Key regions of foreground object can be accurately responded via semantic features, while apparent features provide richer detailed expression.
By fusing semantic and apparent features, as well as cascading the modules of intra-image adaptive feature weight learning and inter-image common feature learning, the research achieves performance that significantly exceeds baselines.
arXiv Detail & Related papers (2020-05-21T08:28:49Z) - Phase Consistent Ecological Domain Adaptation [76.75730500201536]
We focus on the task of semantic segmentation, where annotated synthetic data are aplenty, but annotating real data is laborious.
The first criterion, inspired by visual psychophysics, is that the map between the two image domains be phase-preserving.
The second criterion aims to leverage ecological statistics, or regularities in the scene which are manifest in any image of it, regardless of the characteristics of the illuminant or the imaging sensor.
arXiv Detail & Related papers (2020-04-10T06:58:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.