SSPNet: Scale and Spatial Priors Guided Generalizable and Interpretable
  Pedestrian Attribute Recognition
        - URL: http://arxiv.org/abs/2312.06049v1
- Date: Mon, 11 Dec 2023 00:41:40 GMT
- Title: SSPNet: Scale and Spatial Priors Guided Generalizable and Interpretable
  Pedestrian Attribute Recognition
- Authors: Jifeng Shen, Teng Guo, Xin Zuo, Heng Fan, and Wankou Yang
- Abstract summary: A novel Scale and Spatial Priors Guided Network (SSPNet) is proposed for Pedestrian Attribute Recognition (PAR) models.
SSPNet learns to provide reasonable scale prior information for different attribute groups, allowing the model to focus on different levels of feature maps.
A novel IoU based attribute localization metric is proposed for Weakly-supervised Pedestrian Attribute localization (WPAL) based on the improved Grad-CAM for attribute response mask.
- Score: 23.55622798950833
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract:   Global feature based Pedestrian Attribute Recognition (PAR) models are often
poorly localized when using Grad-CAM for attribute response analysis, which has
a significant impact on the interpretability, generalizability and performance.
Previous researches have attempted to improve generalization and interpretation
through meticulous model design, yet they often have neglected or underutilized
effective prior information crucial for PAR. To this end, a novel Scale and
Spatial Priors Guided Network (SSPNet) is proposed for PAR, which is mainly
composed of the Adaptive Feature Scale Selection (AFSS) and Prior Location
Extraction (PLE) modules. The AFSS module learns to provide reasonable scale
prior information for different attribute groups, allowing the model to focus
on different levels of feature maps with varying semantic granularity. The PLE
module reveals potential attribute spatial prior information, which avoids
unnecessary attention on irrelevant areas and lowers the risk of model
over-fitting. More specifically, the scale prior in AFSS is adaptively learned
from different layers of feature pyramid with maximum accuracy, while the
spatial priors in PLE can be revealed from part feature with different
granularity (such as image blocks, human pose keypoint and sparse sampling
points). Besides, a novel IoU based attribute localization metric is proposed
for Weakly-supervised Pedestrian Attribute Localization (WPAL) based on the
improved Grad-CAM for attribute response mask. The experimental results on the
intra-dataset and cross-dataset evaluations demonstrate the effectiveness of
our proposed method in terms of mean accuracy (mA). Furthermore, it also
achieves superior performance on the PCS dataset for attribute localization in
terms of IoU. Code will be released at https://github.com/guotengg/SSPNet.
 
      
        Related papers
        - Black Sheep in the Herd: Playing with Spuriously Correlated Attributes   for Vision-Language Recognition [8.950906917573986]
 Few-shot adaptation for Vision-Language Models (VLMs) presents a dilemma: balancing in-distribution accuracy with out-of-distribution generalization.
Recent research has utilized low-level concepts such as visual attributes to enhance generalization.
This study reveals that VLMs overly rely on a small subset of attributes on decision-making, which co-occur with the category but are not inherently part of it, spuriously correlated attributes.
 arXiv  Detail & Related papers  (2025-02-19T12:05:33Z)
- Rethinking Pre-trained Feature Extractor Selection in Multiple Instance   Learning for Whole Slide Image Classification [2.6703221234079946]
 Multiple instance learning (MIL) has become a preferred method for gigapixel whole slide image (WSI) classification without requiring patch-level annotations.
This study systematically evaluating MIL feature extractors across three dimensions: pre-training dataset, backbone model, and pre-training method.
Our findings reveal that selecting a robust self-supervised learning (SSL) method has a greater impact on performance than relying solely on an in-domain pre-training dataset.
 arXiv  Detail & Related papers  (2024-08-02T10:34:23Z)
- `Eyes of a Hawk and Ears of a Fox': Part Prototype Network for   Generalized Zero-Shot Learning [47.1040786932317]
 Current approaches in Generalized Zero-Shot Learning (GZSL) are built upon base models which consider only a single class attribute vector representation over the entire image.
We take a fundamentally different approach: a pre-trained Vision-Language detector (VINVL) sensitive to attribute information is employed to efficiently obtain region features.
A learned function maps the region features to region-specific attribute attention used to construct class part prototypes.
 arXiv  Detail & Related papers  (2024-04-12T18:37:00Z)
- Prospector Heads: Generalized Feature Attribution for Large Models &   Data [82.02696069543454]
 We introduce prospector heads, an efficient and interpretable alternative to explanation-based attribution methods.
We demonstrate how prospector heads enable improved interpretation and discovery of class-specific patterns in input data.
 arXiv  Detail & Related papers  (2024-02-18T23:01:28Z)
- Attribute Localization and Revision Network for Zero-Shot Learning [13.530912616208722]
 Zero-shot learning enables the model to recognize unseen categories with the aid of auxiliary semantic information such as attributes.
In this paper, we find that the choice between local and global features is not a zero-sum game, global features can also contribute to the understanding of attributes.
 arXiv  Detail & Related papers  (2023-10-11T14:50:52Z)
- Physics Inspired Hybrid Attention for SAR Target Recognition [61.01086031364307]
 We propose a physics inspired hybrid attention (PIHA) mechanism and the once-for-all (OFA) evaluation protocol to address the issues.
PIHA leverages the high-level semantics of physical information to activate and guide the feature group aware of local semantics of target.
Our method outperforms other state-of-the-art approaches in 12 test scenarios with same ASC parameters.
 arXiv  Detail & Related papers  (2023-09-27T14:39:41Z)
- Salient Object Detection in Optical Remote Sensing Images Driven by
  Transformer [69.22039680783124]
 We propose a novel Global Extraction Local Exploration Network (GeleNet) for Optical Remote Sensing Images (ORSI-SOD)
Specifically, GeleNet first adopts a transformer backbone to generate four-level feature embeddings with global long-range dependencies.
Extensive experiments on three public datasets demonstrate that the proposed GeleNet outperforms relevant state-of-the-art methods.
 arXiv  Detail & Related papers  (2023-09-15T07:14:43Z)
- Calibrated Feature Decomposition for Generalizable Person
  Re-Identification [82.64133819313186]
 Calibrated Feature Decomposition (CFD) module focuses on improving the generalization capacity for person re-identification.
A calibrated-and-standardized Batch normalization (CSBN) is designed to learn calibrated person representation.
 arXiv  Detail & Related papers  (2021-11-27T17:12:43Z)
- Adversarial Feature Augmentation and Normalization for Visual
  Recognition [109.6834687220478]
 Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models.
Here, we present an effective and efficient alternative that advocates adversarial augmentation on intermediate feature embeddings.
We validate the proposed approach across diverse visual recognition tasks with representative backbone networks.
 arXiv  Detail & Related papers  (2021-03-22T20:36:34Z)
- Towards Better Object Detection in Scale Variation with Adaptive Feature
  Selection [3.5352273012717044]
 We propose a novel adaptive feature selection module (AFSM) to automatically learn the way to fuse multi-level representations in the channel dimension.
It significantly improves the performance of the detectors that have a feature pyramid structure.
A class-aware sampling mechanism (CASM) is proposed to tackle the class imbalance problem.
 arXiv  Detail & Related papers  (2020-12-06T13:41:20Z)
- Global Context-Aware Progressive Aggregation Network for Salient Object
  Detection [117.943116761278]
 We propose a novel network named GCPANet to integrate low-level appearance features, high-level semantic features, and global context features.
We show that the proposed approach outperforms the state-of-the-art methods both quantitatively and qualitatively.
 arXiv  Detail & Related papers  (2020-03-02T04:26:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.