Cross-modality debiasing: using language to mitigate sub-population shifts in imaging
- URL: http://arxiv.org/abs/2403.07888v2
- Date: Tue, 2 Apr 2024 14:47:23 GMT
- Title: Cross-modality debiasing: using language to mitigate sub-population shifts in imaging
- Authors: Yijiang Pang, Bao Hoang, Jiayu Zhou,
- Abstract summary: Sub-population shift accounts for a significant source of algorithmic bias and calls for distributional robustness.
Recent studies found inherent distributional robustness in multi-modality foundation models, such as the vision-language model CLIP.
We propose leveraging natural language inputs to debias the image feature representations, to improve worst-case performance on sub-populations.
- Score: 28.88097536026781
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sub-population shift is a specific type of domain shift that highlights changes in data distribution within specific sub-groups or populations between training and testing. Sub-population shift accounts for a significant source of algorithmic bias and calls for distributional robustness. Recent studies found inherent distributional robustness in multi-modality foundation models, such as the vision-language model CLIP, yet this robustness is vulnerable through parameter fine-tuning. In this paper, we propose leveraging the connection of robustness among different modalities and reshaping the distributional robustness of one modality with another. Specifically, in the context of the distributional robustness of CLIP, we propose to leverage natural language inputs to debias the image feature representations, to improve worst-case performance on sub-populations. Our extensive empirical studies show that image representations debiased by natural language can achieve significant performance improvement and reduction of performance instability under sub-population shifts.
Related papers
- An Effective Deployment of Diffusion LM for Data Augmentation in Low-Resource Sentiment Classification [2.0930389307057427]
Sentiment classification (SC) often suffers from low-resource challenges such as domain-specific contexts, imbalanced label distributions, and few-shot scenarios.
We propose Diffusion LM to capture in-domain knowledge and generate pseudo samples by reconstructing strong label-related tokens.
arXiv Detail & Related papers (2024-09-05T02:51:28Z) - Identifiable Latent Neural Causal Models [82.14087963690561]
Causal representation learning seeks to uncover latent, high-level causal representations from low-level observed data.
We determine the types of distribution shifts that do contribute to the identifiability of causal representations.
We translate our findings into a practical algorithm, allowing for the acquisition of reliable latent causal representations.
arXiv Detail & Related papers (2024-03-23T04:13:55Z) - Enhancing Robustness of Foundation Model Representations under
Provenance-related Distribution Shifts [8.298173603769063]
We examine the stability of models based on foundation models under distribution shift.
We focus on confounding by provenance, a form of distribution shift that emerges in the context of multi-institutional datasets.
Results indicate that while foundation models do show some out-of-the-box robustness to confounding-by-provenance related distribution shifts, this can be improved through adjustment.
arXiv Detail & Related papers (2023-12-09T02:02:45Z) - Distributionally Robust Optimization and Invariant Representation
Learning for Addressing Subgroup Underrepresentation: Mechanisms and
Limitations [10.4894578909708]
Spurious correlation caused by subgroup underrepresentation has received increasing attention as a source of bias that can be perpetuated by DNNs.
We take the first step to better understand and improve the mechanisms for debiasing spurious correlation due to subgroup underrepresentation in medical image classification.
arXiv Detail & Related papers (2023-08-12T01:55:58Z) - Shared Latent Space by Both Languages in Non-Autoregressive Neural Machine Translation [0.0]
Non-autoregressive neural machine translation (NAT) offers substantial translation speed up compared to autoregressive neural machine translation (AT)
Latent variable modeling has emerged as a promising approach to bridge this quality gap.
arXiv Detail & Related papers (2023-05-02T15:33:09Z) - Source-free Domain Adaptation Requires Penalized Diversity [60.04618512479438]
Source-free domain adaptation (SFDA) was introduced to address knowledge transfer between different domains in the absence of source data.
In unsupervised SFDA, the diversity is limited to learning a single hypothesis on the source or learning multiple hypotheses with a shared feature extractor.
We propose a novel unsupervised SFDA algorithm that promotes representational diversity through the use of separate feature extractors.
arXiv Detail & Related papers (2023-04-06T00:20:19Z) - Tailoring Language Generation Models under Total Variation Distance [55.89964205594829]
The standard paradigm of neural language generation adopts maximum likelihood estimation (MLE) as the optimizing method.
We develop practical bounds to apply it to language generation.
We introduce the TaiLr objective that balances the tradeoff of estimating TVD.
arXiv Detail & Related papers (2023-02-26T16:32:52Z) - Correlation Information Bottleneck: Towards Adapting Pretrained
Multimodal Models for Robust Visual Question Answering [63.87200781247364]
Correlation Information Bottleneck (CIB) seeks a tradeoff between compression and redundancy in representations.
We derive a tight theoretical upper bound for the mutual information between multimodal inputs and representations.
arXiv Detail & Related papers (2022-09-14T22:04:10Z) - Discrete Variational Attention Models for Language Generation [51.88612022940496]
We propose a discrete variational attention model with categorical distribution over the attention mechanism owing to the discrete nature in languages.
Thanks to the property of discreteness, the training of our proposed approach does not suffer from posterior collapse.
arXiv Detail & Related papers (2020-04-21T05:49:04Z) - When Relation Networks meet GANs: Relation GANs with Triplet Loss [110.7572918636599]
Training stability is still a lingering concern of generative adversarial networks (GANs)
In this paper, we explore a relation network architecture for the discriminator and design a triplet loss which performs better generalization and stability.
Experiments on benchmark datasets show that the proposed relation discriminator and new loss can provide significant improvement on variable vision tasks.
arXiv Detail & Related papers (2020-02-24T11:35:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.