Related papers: RAIGen: Rare Attribute Identification in Text-to-Image Generative Models

RAIGen: Rare Attribute Identification in Text-to-Image Generative Models

URL: http://arxiv.org/abs/2602.06806v1
Date: Fri, 06 Feb 2026 15:54:41 GMT
Title: RAIGen: Rare Attribute Identification in Text-to-Image Generative Models
Authors: Silpa Vadakkeeveetil Sreelatha, Dan Wang, Serge Belongie, Muhammad Awais, Anjan Dutta,
Abstract summary: We introduce RAIGen, the first framework, for un-supervised rare-attribute discovery in diffusion models.<n>We show RAIGen discovers attributes beyond fixed fairness categories in Stable Diffusion, scales to larger models such as SDXL, and enables targeted amplification of rare attributes during generation.
Score: 12.120097479039373
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Text-to-image diffusion models achieve impressive generation quality but inherit and amplify training-data biases, skewing coverage of semantic attributes. Prior work addresses this in two ways. Closed-set approaches mitigate biases in predefined fairness categories (e.g., gender, race), assuming socially salient minority attributes are known a priori. Open-set approaches frame the task as bias identification, highlighting majority attributes that dominate outputs. Both overlook a complementary task: uncovering rare or minority features underrepresented in the data distribution (social, cultural, or stylistic) yet still encoded in model representations. We introduce RAIGen, the first framework, to our knowledge, for un-supervised rare-attribute discovery in diffusion models. RAIGen leverages Matryoshka Sparse Autoencoders and a novel minority metric combining neuron activation frequency with semantic distinctiveness to identify interpretable neurons whose top-activating images reveal underrepresented attributes. Experiments show RAIGen discovers attributes beyond fixed fairness categories in Stable Diffusion, scales to larger models such as SDXL, supports systematic auditing across architectures, and enables targeted amplification of rare attributes during generation.

Related papers

What really matters for person re-identification? A Mixture-of-Experts Framework for Semantic Attribute Importance [3.1485041255193784]
MoSAIC-ReID is a Mixture-of-Experts framework that systematically quantifies the importance of pedestrian attributes for re-identification.<n>Our approach uses LoRA-based experts, each linked to a single attribute, and an oracle router that enables controlled attribution analysis.
arXiv Detail & Related papers (2025-12-09T15:14:28Z)
What Makes You Unique? Attribute Prompt Composition for Object Re-Identification [70.67907354506278]
Object Re-IDentification aims to recognize individuals across non-overlapping camera views.<n>Single-domain models tend to overfit to domain-specific features, whereas cross-domain models often rely on diverse normalization strategies.<n>We propose an Attribute Prompt Composition framework, which exploits textual semantics to jointly enhance discrimination and generalization.
arXiv Detail & Related papers (2025-09-23T07:03:08Z)
Fair Generation without Unfair Distortions: Debiasing Text-to-Image Generation with Entanglement-Free Attention [42.277875137852234]
Entanglement-Free Attention (EFA) is a method that accurately incorporates target attributes while preserving non-target attributes during bias mitigation.<n>At inference time, EFA randomly samples a target attribute with equal probability and adjusts the cross-attention in selected layers to incorporate the sampled attribute.<n>Extensive experiments demonstrate that EFA outperforms existing methods in mitigating bias while preserving non-target attributes.
arXiv Detail & Related papers (2025-06-16T09:40:32Z)
DDAE++: Enhancing Diffusion Models Towards Unified Generative and Discriminative Learning [53.27049077100897]
generative pre-training has been shown to yield discriminative representations, paving the way towards unified visual generation and understanding.<n>This work introduces self-conditioning, a mechanism that internally leverages the rich semantics inherent in denoising network to guide its own decoding layers.<n>Results are compelling: our method boosts both generation FID and recognition accuracy with 1% computational overhead and generalizes across diverse diffusion architectures.
arXiv Detail & Related papers (2025-05-16T08:47:16Z)
Leveraging vision-language models for fair facial attribute classification [19.93324644519412]
General-purpose vision-language model (VLM) is a rich knowledge source for common sensitive attributes. We analyze the correspondence between VLM predicted and human defined sensitive attribute distribution. Experiments on multiple benchmark facial attribute classification datasets show fairness gains of the model over existing unsupervised baselines.
arXiv Detail & Related papers (2024-03-15T18:37:15Z)
Distributionally Generative Augmentation for Fair Facial Attribute Classification [69.97710556164698]
Facial Attribute Classification (FAC) holds substantial promise in widespread applications. FAC models trained by traditional methodologies can be unfair by exhibiting accuracy inconsistencies across varied data subpopulations. This work proposes a novel, generation-based two-stage framework to train a fair FAC model on biased data without additional annotation.
arXiv Detail & Related papers (2024-03-11T10:50:53Z)
Exploring Fine-Grained Representation and Recomposition for Cloth-Changing Person Re-Identification [78.52704557647438]
We propose a novel FIne-grained Representation and Recomposition (FIRe$2$) framework to tackle both limitations without any auxiliary annotation or data. Experiments demonstrate that FIRe$2$ can achieve state-of-the-art performance on five widely-used cloth-changing person Re-ID benchmarks.
arXiv Detail & Related papers (2023-08-21T12:59:48Z)
Hierarchical Visual Primitive Experts for Compositional Zero-Shot Learning [52.506434446439776]
Compositional zero-shot learning (CZSL) aims to recognize compositions with prior knowledge of known primitives (attribute and object) We propose a simple and scalable framework called Composition Transformer (CoT) to address these issues. Our method achieves SoTA performance on several benchmarks, including MIT-States, C-GQA, and VAW-CZSL.
arXiv Detail & Related papers (2023-08-08T03:24:21Z)
Exploiting Semantic Attributes for Transductive Zero-Shot Learning [97.61371730534258]
Zero-shot learning aims to recognize unseen classes by generalizing the relation between visual features and semantic attributes learned from the seen classes. We present a novel transductive ZSL method that produces semantic attributes of the unseen data and imposes them on the generative process. Experiments on five standard benchmarks show that our method yields state-of-the-art results for zero-shot learning.
arXiv Detail & Related papers (2023-03-17T09:09:48Z)
Classify and Generate: Using Classification Latent Space Representations for Image Generations [17.184760662429834]
We propose a discriminative modeling framework that employs manipulated supervised latent representations to reconstruct and generate new samples belonging to a given class. ReGene has higher classification accuracy than existing conditional generative models while being competitive in terms of FID.
arXiv Detail & Related papers (2020-04-16T09:13:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.