Related papers: SequencePAR: Understanding Pedestrian Attributes via A Sequence Generation Paradigm

SequencePAR: Understanding Pedestrian Attributes via A Sequence Generation Paradigm

URL: http://arxiv.org/abs/2312.01640v1
Date: Mon, 4 Dec 2023 05:42:56 GMT
Title: SequencePAR: Understanding Pedestrian Attributes via A Sequence Generation Paradigm
Authors: Jiandong Jin, Xiao Wang, Chenglong Li, Lili Huang, and Jin Tang
Abstract summary: We propose a novel sequence generation paradigm for pedestrian attribute recognition, termed SequencePAR. It extracts the pedestrian features using a pre-trained CLIP model and embeds the attribute set into query tokens under the guidance of text prompts. The masked multi-head attention layer is introduced into the decoder module to prevent the model from remembering the next attribute while making attribute predictions during training.
Score: 18.53048511206039
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Current pedestrian attribute recognition (PAR) algorithms are developed based on multi-label or multi-task learning frameworks, which aim to discriminate the attributes using specific classification heads. However, these discriminative models are easily influenced by imbalanced data or noisy samples. Inspired by the success of generative models, we rethink the pedestrian attribute recognition scheme and believe the generative models may perform better on modeling dependencies and complexity between human attributes. In this paper, we propose a novel sequence generation paradigm for pedestrian attribute recognition, termed SequencePAR. It extracts the pedestrian features using a pre-trained CLIP model and embeds the attribute set into query tokens under the guidance of text prompts. Then, a Transformer decoder is proposed to generate the human attributes by incorporating the visual features and attribute query tokens. The masked multi-head attention layer is introduced into the decoder module to prevent the model from remembering the next attribute while making attribute predictions during training. Extensive experiments on multiple widely used pedestrian attribute recognition datasets fully validated the effectiveness of our proposed SequencePAR. The source code and pre-trained models will be released at https://github.com/Event-AHU/OpenPAR.

Related papers

LATex: Leveraging Attribute-based Text Knowledge for Aerial-Ground Person Re-Identification [63.07563443280147]
We propose a novel framework named LATex for AG-ReID. It adopts prompt-tuning strategies to leverage attribute-based text knowledge. Our framework can fully leverage attribute-based text knowledge to improve the AG-ReID.
arXiv Detail & Related papers (2025-03-31T04:47:05Z)
Adaptive Prototype Model for Attribute-based Multi-label Few-shot Action Recognition [11.316708754749103]
In real-world action recognition systems, incorporating more attributes helps achieve a more comprehensive understanding of human behavior. We propose a novel method i.e. Adaptive Attribute Prototype Model (AAPM) for human action recognition, which captures rich action-relevant attribute information. Our AAPM achieves the state-of-the-art performance in both attribute-based multi-label few-shot action recognition and single-label few-shot action recognition.
arXiv Detail & Related papers (2025-02-18T06:39:28Z)
Hybrid Discriminative Attribute-Object Embedding Network for Compositional Zero-Shot Learning [83.10178754323955]
Hybrid Discriminative Attribute-Object Embedding (HDA-OE) network is proposed to solve the problem of complex interactions between attributes and object visual representations. To increase the variability of training data, HDA-OE introduces an attribute-driven data synthesis (ADDS) module. To further improve the discriminative ability of the model, HDA-OE introduces the subclass-driven discriminative embedding (SDDE) module. The proposed model has been evaluated on three benchmark datasets, and the results verify its effectiveness and reliability.
arXiv Detail & Related papers (2024-11-28T09:50:25Z)
Spatio-Temporal Side Tuning Pre-trained Foundation Models for Video-based Pedestrian Attribute Recognition [58.79807861739438]
Existing pedestrian recognition (PAR) algorithms are mainly developed based on a static image. We propose to understand human attributes using video frames that can fully use temporal information.
arXiv Detail & Related papers (2024-04-27T14:43:32Z)
Exploring Diffusion Time-steps for Unsupervised Representation Learning [72.43246871893936]
We build a theoretical framework that connects the diffusion time-steps and the hidden attributes. On CelebA, FFHQ, and Bedroom datasets, the learned feature significantly improves classification.
arXiv Detail & Related papers (2024-01-21T08:35:25Z)
Attribute-Aware Deep Hashing with Self-Consistency for Large-Scale Fine-Grained Image Retrieval [65.43522019468976]
We propose attribute-aware hashing networks with self-consistency for generating attribute-aware hash codes. We develop an encoder-decoder structure network of a reconstruction task to unsupervisedly distill high-level attribute-specific vectors. Our models are equipped with a feature decorrelation constraint upon these attribute vectors to strengthen their representative abilities.
arXiv Detail & Related papers (2023-11-21T08:20:38Z)
A Solution to Co-occurrence Bias: Attributes Disentanglement via Mutual Information Minimization for Pedestrian Attribute Recognition [10.821982414387525]
We show that current methods can actually suffer in generalizing such fitted attributes interdependencies onto scenes or identities off the dataset distribution. To render models robust in realistic scenes, we propose the attributes-disentangled feature learning to ensure the recognition of an attribute not inferring on the existence of others.
arXiv Detail & Related papers (2023-07-28T01:34:55Z)
POAR: Towards Open Vocabulary Pedestrian Attribute Recognition [39.399286703315745]
Pedestrian attribute recognition (PAR) aims to predict the attributes of a target pedestrian in a surveillance system. It is impossible to exhaust all pedestrian attributes in the real world. We develop a novel pedestrian open-attribute recognition framework.
arXiv Detail & Related papers (2023-03-26T06:59:23Z)
Exploiting Semantic Attributes for Transductive Zero-Shot Learning [97.61371730534258]
Zero-shot learning aims to recognize unseen classes by generalizing the relation between visual features and semantic attributes learned from the seen classes. We present a novel transductive ZSL method that produces semantic attributes of the unseen data and imposes them on the generative process. Experiments on five standard benchmarks show that our method yields state-of-the-art results for zero-shot learning.
arXiv Detail & Related papers (2023-03-17T09:09:48Z)
Dynamic Prototype Mask for Occluded Person Re-Identification [88.7782299372656]
Existing methods mainly address this issue by employing body clues provided by an extra network to distinguish the visible part. We propose a novel Dynamic Prototype Mask (DPM) based on two self-evident prior knowledge. Under this condition, the occluded representation could be well aligned in a selected subspace spontaneously.
arXiv Detail & Related papers (2022-07-19T03:31:13Z)
Boosting Generative Zero-Shot Learning by Synthesizing Diverse Features with Attribute Augmentation [21.72622601533585]
We propose a novel framework to boost Zero-Shot Learning (ZSL) by synthesizing diverse features. This method uses augmented semantic attributes to train the generative model, so as to simulate the real distribution of visual features. We evaluate the proposed model on four benchmark datasets, observing significant performance improvement against the state-of-the-art.
arXiv Detail & Related papers (2021-12-23T14:32:51Z)
Efficient Attribute Injection for Pretrained Language Models [20.39972635495006]
We propose a lightweight and memory-efficient method to inject attributes to pretrained language models (PLMs) To limit the increase of parameters especially when the attribute vocabulary is large, we use low-rank approximations and hypercomplex multiplications. Our method outperforms previous attribute injection methods and achieves state-of-the-art performance on various datasets.
arXiv Detail & Related papers (2021-09-16T13:08:24Z)
AdaTag: Multi-Attribute Value Extraction from Product Profiles with Adaptive Decoding [55.89773725577615]
We present AdaTag, which uses adaptive decoding to handle attribute extraction. Our experiments on a real-world e-Commerce dataset show marked improvements over previous methods.
arXiv Detail & Related papers (2021-06-04T07:54:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.