CLEAR: Unlearning Spurious Style-Content Associations with Contrastive LEarning with Anti-contrastive Regularization
- URL: http://arxiv.org/abs/2507.18794v1
- Date: Thu, 24 Jul 2025 20:31:21 GMT
- Title: CLEAR: Unlearning Spurious Style-Content Associations with Contrastive LEarning with Anti-contrastive Regularization
- Authors: Minghui Sun, Benjamin A. Goldstein, Matthew M. Engelhard,
- Abstract summary: We propose Contrastive LEarning with Anti-contrastive Regularization (CLEAR)<n>CLEAR separates essential (i.e., task-relevant) characteristics from superficial (i.e., task-irrelevant) characteristics during training, leading to better performance when superficial characteristics shift at test time.<n>Our results show that CLEAR-VAE allows us to: (a) swap and interpolate content and style between any pair of samples, and (b) improve downstream classification performance in the presence of previously unseen combinations of content and style.
- Score: 4.171555557592296
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learning representations unaffected by superficial characteristics is important to ensure that shifts in these characteristics at test time do not compromise downstream prediction performance. For instance, in healthcare applications, we might like to learn features that contain information about pathology yet are unaffected by race, sex, and other sources of physiologic variability, thereby ensuring predictions are equitable and generalizable across all demographics. Here we propose Contrastive LEarning with Anti-contrastive Regularization (CLEAR), an intuitive and easy-to-implement framework that effectively separates essential (i.e., task-relevant) characteristics from superficial (i.e., task-irrelevant) characteristics during training, leading to better performance when superficial characteristics shift at test time. We begin by supposing that data representations can be semantically separated into task-relevant content features, which contain information relevant to downstream tasks, and task-irrelevant style features, which encompass superficial attributes that are irrelevant to these tasks, yet may degrade performance due to associations with content present in training data that do not generalize. We then prove that our anti-contrastive penalty, which we call Pair-Switching (PS), minimizes the Mutual Information between the style attributes and content labels. Finally, we instantiate CLEAR in the latent space of a Variational Auto-Encoder (VAE), then perform experiments to quantitatively and qualitatively evaluate the resulting CLEAR-VAE over several image datasets. Our results show that CLEAR-VAE allows us to: (a) swap and interpolate content and style between any pair of samples, and (b) improve downstream classification performance in the presence of previously unseen combinations of content and style. Our code will be made publicly available.
Related papers
- Stochastic Encodings for Active Feature Acquisition [100.47043816019888]
Active Feature Acquisition is an instance-wise, sequential decision making problem.<n>The aim is to dynamically select which feature to measure based on current observations, independently for each test instance.<n>Common approaches either use Reinforcement Learning, which experiences training difficulties, or greedily maximize the conditional mutual information of the label and unobserved features, which makes myopic.<n>We introduce a latent variable model, trained in a supervised manner. Acquisitions are made by reasoning about the features across many possible unobserved realizations in a latent space.
arXiv Detail & Related papers (2025-08-03T23:48:46Z) - Compositional Caching for Training-free Open-vocabulary Attribute Detection [65.46250297408974]
We present Compositional Caching (ComCa), a training-free method for open-vocabulary attribute detection.<n>ComCa requires only the list of target attributes and objects as input, using them to populate an auxiliary cache of images.<n>Experiments on public datasets demonstrate that ComCa significantly outperforms zero-shot and cache-based baselines.
arXiv Detail & Related papers (2025-03-24T21:00:37Z) - Hybrid Discriminative Attribute-Object Embedding Network for Compositional Zero-Shot Learning [83.10178754323955]
Hybrid Discriminative Attribute-Object Embedding (HDA-OE) network is proposed to solve the problem of complex interactions between attributes and object visual representations.<n>To increase the variability of training data, HDA-OE introduces an attribute-driven data synthesis (ADDS) module.<n>To further improve the discriminative ability of the model, HDA-OE introduces the subclass-driven discriminative embedding (SDDE) module.<n>The proposed model has been evaluated on three benchmark datasets, and the results verify its effectiveness and reliability.
arXiv Detail & Related papers (2024-11-28T09:50:25Z) - Enhancing Vision-Language Few-Shot Adaptation with Negative Learning [11.545127156146368]
We propose a Simple yet effective Negative Learning approach, SimNL, to more efficiently exploit task-specific knowledge.
To this issue, we introduce a plug-and-play few-shot instance reweighting technique to mitigate noisy outliers.
Our extensive experimental results validate that the proposed SimNL outperforms existing state-of-the-art methods on both few-shot learning and domain generalization tasks.
arXiv Detail & Related papers (2024-03-19T17:59:39Z) - Learning the Unlearned: Mitigating Feature Suppression in Contrastive Learning [45.25602203155762]
Self-Supervised Contrastive Learning has proven effective in deriving high-quality representations from unlabeled data.
A major challenge that hinders both unimodal and multimodal contrastive learning is feature suppression.
We propose a novel model-agnostic Multistage Contrastive Learning framework.
arXiv Detail & Related papers (2024-02-19T04:13:33Z) - Attribute-Aware Representation Rectification for Generalized Zero-Shot
Learning [19.65026043141699]
Generalized Zero-shot Learning (GZSL) has yielded remarkable performance by designing a series of unbiased visual-semantics mappings.
We propose a simple yet effective Attribute-Aware Representation Rectification framework for GZSL, dubbed $mathbf(AR)2$.
arXiv Detail & Related papers (2023-11-23T11:30:32Z) - Enhancing Few-shot CLIP with Semantic-Aware Fine-Tuning [61.902254546858465]
Methods based on Contrastive Language-Image Pre-training have exhibited promising performance in few-shot adaptation tasks.
We propose fine-tuning the parameters of the attention pooling layer during the training process to encourage the model to focus on task-specific semantics.
arXiv Detail & Related papers (2023-11-08T05:18:57Z) - Hierarchical Visual Primitive Experts for Compositional Zero-Shot
Learning [52.506434446439776]
Compositional zero-shot learning (CZSL) aims to recognize compositions with prior knowledge of known primitives (attribute and object)
We propose a simple and scalable framework called Composition Transformer (CoT) to address these issues.
Our method achieves SoTA performance on several benchmarks, including MIT-States, C-GQA, and VAW-CZSL.
arXiv Detail & Related papers (2023-08-08T03:24:21Z) - Exploiting Semantic Attributes for Transductive Zero-Shot Learning [97.61371730534258]
Zero-shot learning aims to recognize unseen classes by generalizing the relation between visual features and semantic attributes learned from the seen classes.
We present a novel transductive ZSL method that produces semantic attributes of the unseen data and imposes them on the generative process.
Experiments on five standard benchmarks show that our method yields state-of-the-art results for zero-shot learning.
arXiv Detail & Related papers (2023-03-17T09:09:48Z) - OvarNet: Towards Open-vocabulary Object Attribute Recognition [42.90477523238336]
We propose a naive two-stage approach for open-vocabulary object detection and attribute classification, termed CLIP-Attr.
The candidate objects are first proposed with an offline RPN and later classified for semantic category and attributes.
We show that recognition of semantic category and attributes is complementary for visual scene understanding.
arXiv Detail & Related papers (2023-01-23T15:59:29Z) - Robust Representation Learning via Perceptual Similarity Metrics [18.842322467828502]
Contrastive Input Morphing (CIM) is a representation learning framework that learns input-space transformations of the data.
We show that CIM is complementary to other mutual information-based representation learning techniques.
arXiv Detail & Related papers (2021-06-11T21:45:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.