PARFormer: Transformer-based Multi-Task Network for Pedestrian Attribute
Recognition
- URL: http://arxiv.org/abs/2304.07230v1
- Date: Fri, 14 Apr 2023 16:27:56 GMT
- Title: PARFormer: Transformer-based Multi-Task Network for Pedestrian Attribute
Recognition
- Authors: Xinwen Fan, Yukang Zhang, Yang Lu, Hanzi Wang
- Abstract summary: We propose a pure transformer-based multi-task PAR network named PARFormer, which includes four modules.
In the feature extraction module, we build a strong baseline for feature extraction, which achieves competitive results on several PAR benchmarks.
In the viewpoint perception module, we explore the impact of viewpoints on pedestrian attributes, and propose a multi-view contrastive loss.
In the attribute recognition module, we alleviate the negative-positive imbalance problem to generate the attribute predictions.
- Score: 23.814762073093153
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Pedestrian attribute recognition (PAR) has received increasing attention
because of its wide application in video surveillance and pedestrian analysis.
Extracting robust feature representation is one of the key challenges in this
task. The existing methods mainly use the convolutional neural network (CNN) as
the backbone network to extract features. However, these methods mainly focus
on small discriminative regions while ignoring the global perspective. To
overcome these limitations, we propose a pure transformer-based multi-task PAR
network named PARFormer, which includes four modules. In the feature extraction
module, we build a transformer-based strong baseline for feature extraction,
which achieves competitive results on several PAR benchmarks compared with the
existing CNN-based baseline methods. In the feature processing module, we
propose an effective data augmentation strategy named batch random mask (BRM)
block to reinforce the attentive feature learning of random patches.
Furthermore, we propose a multi-attribute center loss (MACL) to enhance the
inter-attribute discriminability in the feature representations. In the
viewpoint perception module, we explore the impact of viewpoints on pedestrian
attributes, and propose a multi-view contrastive loss (MCVL) that enables the
network to exploit the viewpoint information. In the attribute recognition
module, we alleviate the negative-positive imbalance problem to generate the
attribute predictions. The above modules interact and jointly learn a highly
discriminative feature space, and supervise the generation of the final
features. Extensive experimental results show that the proposed PARFormer
network performs well compared to the state-of-the-art methods on several
public datasets, including PETA, RAP, and PA100K. Code will be released at
https://github.com/xwf199/PARFormer.
Related papers
- Accurate and lightweight dehazing via multi-receptive-field non-local
network and novel contrastive regularization [9.90146712189936]
This paper presents a multi-receptive-field non-local network (MRFNLN) for image dehazing.
It is designed as a multi-stream feature attention block (MSFAB) and cross non-local block (CNLB)
It outperforms recent state-of-the-art dehazing methods with less than 1.5 Million parameters.
arXiv Detail & Related papers (2023-09-28T14:59:16Z) - Part-guided Relational Transformers for Fine-grained Visual Recognition [59.20531172172135]
We propose a framework to learn the discriminative part features and explore correlations with a feature transformation module.
Our proposed approach does not rely on additional part branches and reaches state-the-of-art performance on 3-of-the-level object recognition.
arXiv Detail & Related papers (2022-12-28T03:45:56Z) - Feature Aggregation and Propagation Network for Camouflaged Object
Detection [42.33180748293329]
Camouflaged object detection (COD) aims to detect/segment camouflaged objects embedded in the environment.
Several COD methods have been developed, but they still suffer from unsatisfactory performance due to intrinsic similarities between foreground objects and background surroundings.
We propose a novel Feature Aggregation and propagation Network (FAP-Net) for camouflaged object detection.
arXiv Detail & Related papers (2022-12-02T05:54:28Z) - Multi-level Second-order Few-shot Learning [111.0648869396828]
We propose a Multi-level Second-order (MlSo) few-shot learning network for supervised or unsupervised few-shot image classification and few-shot action recognition.
We leverage so-called power-normalized second-order base learner streams combined with features that express multiple levels of visual abstraction.
We demonstrate respectable results on standard datasets such as Omniglot, mini-ImageNet, tiered-ImageNet, Open MIC, fine-grained datasets such as CUB Birds, Stanford Dogs and Cars, and action recognition datasets such as HMDB51, UCF101, and mini-MIT.
arXiv Detail & Related papers (2022-01-15T19:49:00Z) - Learning to Aggregate Multi-Scale Context for Instance Segmentation in
Remote Sensing Images [28.560068780733342]
A novel context aggregation network (CATNet) is proposed to improve the feature extraction process.
The proposed model exploits three lightweight plug-and-play modules, namely dense feature pyramid network (DenseFPN), spatial context pyramid ( SCP), and hierarchical region of interest extractor (HRoIE)
arXiv Detail & Related papers (2021-11-22T08:55:25Z) - M2IOSR: Maximal Mutual Information Open Set Recognition [47.1393314282815]
We propose a mutual information-based method with a streamlined architecture for open set recognition.
The proposed method significantly improves the performance of baselines and achieves new state-of-the-art results on several benchmarks consistently.
arXiv Detail & Related papers (2021-08-05T05:08:12Z) - A^2-FPN: Attention Aggregation based Feature Pyramid Network for
Instance Segmentation [68.10621089649486]
We propose Attention Aggregation based Feature Pyramid Network (A2-FPN) to improve multi-scale feature learning.
A2-FPN achieves an improvement of 2.0% and 1.4% mask AP when integrated into the strong baselines such as Cascade Mask R-CNN and Hybrid Task Cascade.
arXiv Detail & Related papers (2021-05-07T11:51:08Z) - Adversarial Feature Augmentation and Normalization for Visual
Recognition [109.6834687220478]
Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models.
Here, we present an effective and efficient alternative that advocates adversarial augmentation on intermediate feature embeddings.
We validate the proposed approach across diverse visual recognition tasks with representative backbone networks.
arXiv Detail & Related papers (2021-03-22T20:36:34Z) - Multi-scale Interactive Network for Salient Object Detection [91.43066633305662]
We propose the aggregate interaction modules to integrate the features from adjacent levels.
To obtain more efficient multi-scale features, the self-interaction modules are embedded in each decoder unit.
Experimental results on five benchmark datasets demonstrate that the proposed method without any post-processing performs favorably against 23 state-of-the-art approaches.
arXiv Detail & Related papers (2020-07-17T15:41:37Z) - ResNeSt: Split-Attention Networks [86.25490825631763]
We present a modularized architecture, which applies the channel-wise attention on different network branches to leverage their success in capturing cross-feature interactions and learning diverse representations.
Our model, named ResNeSt, outperforms EfficientNet in accuracy and latency trade-off on image classification.
arXiv Detail & Related papers (2020-04-19T20:40:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.