Related papers: A Data-Centric Approach to Pedestrian Attribute Recognition: Synthetic Augmentation via Prompt-driven Diffusion Models

A Data-Centric Approach to Pedestrian Attribute Recognition: Synthetic Augmentation via Prompt-driven Diffusion Models

URL: http://arxiv.org/abs/2509.02099v1
Date: Tue, 02 Sep 2025 08:56:39 GMT
Title: A Data-Centric Approach to Pedestrian Attribute Recognition: Synthetic Augmentation via Prompt-driven Diffusion Models
Authors: Alejandro Alonso, Sawaiz A. Chaudhry, Juan C. SanMiguel, Álvaro García-Martín, Pablo Ayuso-Albizu, Pablo Carballeira,
Abstract summary: Pedestrian Attribute Recognition (PAR) is a challenging task as models are required to generalize across numerous attributes in real-world data.<n>We propose a data-centric approach to improve PAR by synthetic data augmentation guided by textual descriptions.
Score: 41.58360335940522
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Pedestrian Attribute Recognition (PAR) is a challenging task as models are required to generalize across numerous attributes in real-world data. Traditional approaches focus on complex methods, yet recognition performance is often constrained by training dataset limitations, particularly the under-representation of certain attributes. In this paper, we propose a data-centric approach to improve PAR by synthetic data augmentation guided by textual descriptions. First, we define a protocol to identify weakly recognized attributes across multiple datasets. Second, we propose a prompt-driven pipeline that leverages diffusion models to generate synthetic pedestrian images while preserving the consistency of PAR datasets. Finally, we derive a strategy to seamlessly incorporate synthetic samples into training data, which considers prompt-based annotation rules and modifies the loss function. Results on popular PAR datasets demonstrate that our approach not only boosts recognition of underrepresented attributes but also improves overall model performance beyond the targeted attributes. Notably, this approach strengthens zero-shot generalization without requiring architectural changes of the model, presenting an efficient and scalable solution to improve the recognition of attributes of pedestrians in the real world.

Related papers

Advancing Multinational License Plate Recognition Through Synthetic and Real Data Fusion: A Comprehensive Evaluation [3.3637719592955526]
We explore the integration of real and synthetic data to enhance LPR performance.<n>Massive incorporation of synthetic data substantially boosts model performance in both intra- and cross-dataset scenarios.<n>Experiments underscore the efficacy of synthetic data in mitigating challenges posed by limited training data.
arXiv Detail & Related papers (2026-01-12T15:52:52Z)
What Makes You Unique? Attribute Prompt Composition for Object Re-Identification [70.67907354506278]
Object Re-IDentification aims to recognize individuals across non-overlapping camera views.<n>Single-domain models tend to overfit to domain-specific features, whereas cross-domain models often rely on diverse normalization strategies.<n>We propose an Attribute Prompt Composition framework, which exploits textual semantics to jointly enhance discrimination and generalization.
arXiv Detail & Related papers (2025-09-23T07:03:08Z)
Advancing Semantic Caching for LLMs with Domain-Specific Embeddings and Synthetic Data [3.877325424485755]
This report investigates enhancing semantic caching effectiveness by employing specialized, fine-tuned embedding models.<n>We propose leveraging smaller, domain-specific embedding models, fine-tuned with targeted real-world and synthetically generated datasets.
arXiv Detail & Related papers (2025-04-03T04:27:02Z)
LATex: Leveraging Attribute-based Text Knowledge for Aerial-Ground Person Re-Identification [78.73711446918814]
We propose a novel framework named LATex for AG-ReID, which adopts prompt-tuning strategies to leverage attribute-based text knowledge.<n>Our framework can fully leverage attribute-based text knowledge to improve AG-ReID performance.
arXiv Detail & Related papers (2025-03-31T04:47:05Z)
Incorporating Attributes and Multi-Scale Structures for Heterogeneous Graph Contrastive Learning [8.889313669713918]
We propose a novel contrastive learning framework for heterogeneous graphs (ASHGCL)<n>ASHGCL incorporates three distinct views, each focusing on node attributes, high-order and low-order structural information, respectively.<n>We introduce an attribute-enhanced positive sample selection strategy that combines both structural information and attribute information.
arXiv Detail & Related papers (2025-03-18T05:15:21Z)
Hybrid Discriminative Attribute-Object Embedding Network for Compositional Zero-Shot Learning [83.10178754323955]
Hybrid Discriminative Attribute-Object Embedding (HDA-OE) network is proposed to solve the problem of complex interactions between attributes and object visual representations.<n>To increase the variability of training data, HDA-OE introduces an attribute-driven data synthesis (ADDS) module.<n>To further improve the discriminative ability of the model, HDA-OE introduces the subclass-driven discriminative embedding (SDDE) module.<n>The proposed model has been evaluated on three benchmark datasets, and the results verify its effectiveness and reliability.
arXiv Detail & Related papers (2024-11-28T09:50:25Z)
Spatio-Temporal Side Tuning Pre-trained Foundation Models for Video-based Pedestrian Attribute Recognition [58.79807861739438]
Existing pedestrian recognition (PAR) algorithms are mainly developed based on a static image. We propose to understand human attributes using video frames that can fully use temporal information.
arXiv Detail & Related papers (2024-04-27T14:43:32Z)
Deep Common Feature Mining for Efficient Video Semantic Segmentation [25.851900402539467]
We present Deep Common Feature Mining (DCFM) for video semantic segmentation.<n>DCFM explicitly decomposes features into two complementary components.<n>We incorporate a self-supervised loss function to reinforce intra-class feature similarity and enhance temporal consistency.
arXiv Detail & Related papers (2024-03-05T06:17:59Z)
A Solution to Co-occurrence Bias: Attributes Disentanglement via Mutual Information Minimization for Pedestrian Attribute Recognition [10.821982414387525]
We show that current methods can actually suffer in generalizing such fitted attributes interdependencies onto scenes or identities off the dataset distribution. To render models robust in realistic scenes, we propose the attributes-disentangled feature learning to ensure the recognition of an attribute not inferring on the existence of others.
arXiv Detail & Related papers (2023-07-28T01:34:55Z)
Cluster-level pseudo-labelling for source-free cross-domain facial expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER) Our method exploits self-supervised pretraining to learn good feature representations from the target data. We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z)
CAFE: Learning to Condense Dataset by Aligning Features [72.99394941348757]
We propose a novel scheme to Condense dataset by Aligning FEatures (CAFE) At the heart of our approach is an effective strategy to align features from the real and synthetic data across various scales. We validate the proposed CAFE across various datasets, and demonstrate that it generally outperforms the state of the art.
arXiv Detail & Related papers (2022-03-03T05:58:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.