Bayesian Cross-Modal Alignment Learning for Few-Shot Out-of-Distribution Generalization
- URL: http://arxiv.org/abs/2504.09448v2
- Date: Tue, 22 Apr 2025 10:59:00 GMT
- Title: Bayesian Cross-Modal Alignment Learning for Few-Shot Out-of-Distribution Generalization
- Authors: Lin Zhu, Xinbing Wang, Chenghu Zhou, Nanyang Ye,
- Abstract summary: We introduce a novel cross-modal image-text alignment learning method (Bayes-CAL) to address this issue.<n>Bayes-CAL achieves state-of-the-art OoD generalization performances on two-dimensional distribution shifts.<n>Compared with CLIP-like models, Bayes-CAL yields more stable generalization performances on unseen classes.
- Score: 47.64583975469164
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in large pre-trained models showed promising results in few-shot learning. However, their generalization ability on two-dimensional Out-of-Distribution (OoD) data, i.e., correlation shift and diversity shift, has not been thoroughly investigated. Researches have shown that even with a significant amount of training data, few methods can achieve better performance than the standard empirical risk minimization method (ERM) in OoD generalization. This few-shot OoD generalization dilemma emerges as a challenging direction in deep neural network generalization research, where the performance suffers from overfitting on few-shot examples and OoD generalization errors. In this paper, leveraging a broader supervision source, we explore a novel Bayesian cross-modal image-text alignment learning method (Bayes-CAL) to address this issue. Specifically, the model is designed as only text representations are fine-tuned via a Bayesian modelling approach with gradient orthogonalization loss and invariant risk minimization (IRM) loss. The Bayesian approach is essentially introduced to avoid overfitting the base classes observed during training and improve generalization to broader unseen classes. The dedicated loss is introduced to achieve better image-text alignment by disentangling the causal and non-casual parts of image features. Numerical experiments demonstrate that Bayes-CAL achieved state-of-the-art OoD generalization performances on two-dimensional distribution shifts. Moreover, compared with CLIP-like models, Bayes-CAL yields more stable generalization performances on unseen classes. Our code is available at https://github.com/LinLLLL/BayesCAL.
Related papers
- On Generalization Across Environments In Multi-Objective Reinforcement Learning [6.686583184622338]
We formalize the concept of generalization in Multi-Objective Reinforcement Learning (MORL) and how it can be evaluated.<n>We contribute a novel benchmark featuring diverse multi-objective domains with parameterized environment configurations.<n>Our baseline evaluations of state-of-the-art MORL algorithms on this benchmark reveals limited generalization capabilities, suggesting significant room for improvement.
arXiv Detail & Related papers (2025-03-02T08:50:14Z) - Towards Modality Generalization: A Benchmark and Prospective Analysis [56.84045461854789]
This paper introduces Modality Generalization (MG), which focuses on enabling models to generalize to unseen modalities.<n>We propose a comprehensive benchmark featuring multi-modal algorithms and adapt existing methods that focus on generalization.<n>Our work provides a foundation for advancing robust and adaptable multi-modal models, enabling them to handle unseen modalities in realistic scenarios.
arXiv Detail & Related papers (2024-12-24T08:38:35Z) - Generalize or Detect? Towards Robust Semantic Segmentation Under Multiple Distribution Shifts [56.57141696245328]
In open-world scenarios, where both novel classes and domains may exist, an ideal segmentation model should detect anomaly classes for safety.
Existing methods often struggle to distinguish between domain-level and semantic-level distribution shifts.
arXiv Detail & Related papers (2024-11-06T11:03:02Z) - Towards Robust Out-of-Distribution Generalization: Data Augmentation and Neural Architecture Search Approaches [4.577842191730992]
We study ways toward robust OoD generalization for deep learning.
We first propose a novel and effective approach to disentangle the spurious correlation between features that are not essential for recognition.
We then study the problem of strengthening neural architecture search in OoD scenarios.
arXiv Detail & Related papers (2024-10-25T20:50:32Z) - FlickerFusion: Intra-trajectory Domain Generalizing Multi-Agent RL [19.236153474365747]
Existing MARL approaches often rely on the restrictive assumption that the number of entities remains constant between training and inference.<n>In this paper, we tackle the challenge of intra-trajectory dynamic entity composition under zero-shot out-of-domain (OOD) generalization.<n>We propose FlickerFusion, a novel OOD generalization method that acts as a universally applicable augmentation technique for MARL backbone methods.
arXiv Detail & Related papers (2024-10-21T10:57:45Z) - Learning Feature Inversion for Multi-class Anomaly Detection under General-purpose COCO-AD Benchmark [101.23684938489413]
Anomaly detection (AD) is often focused on detecting anomalies for industrial quality inspection and medical lesion examination.
This work first constructs a large-scale and general-purpose COCO-AD dataset by extending COCO to the AD field.
Inspired by the metrics in the segmentation field, we propose several more practical threshold-dependent AD-specific metrics.
arXiv Detail & Related papers (2024-04-16T17:38:26Z) - Feed Two Birds with One Scone: Exploiting Wild Data for Both
Out-of-Distribution Generalization and Detection [31.68755583314898]
We propose a margin-based learning framework that exploits freely available unlabeled data in the wild.
We show both empirically and theoretically that the proposed margin constraint is the key to achieving both OOD generalization and detection.
arXiv Detail & Related papers (2023-06-15T14:32:35Z) - Compound Batch Normalization for Long-tailed Image Classification [77.42829178064807]
We propose a compound batch normalization method based on a Gaussian mixture.
It can model the feature space more comprehensively and reduce the dominance of head classes.
The proposed method outperforms existing methods on long-tailed image classification.
arXiv Detail & Related papers (2022-12-02T07:31:39Z) - Adaptive Fine-Grained Sketch-Based Image Retrieval [100.90633284767205]
Recent focus on Fine-Grained Sketch-Based Image Retrieval has shifted towards generalising a model to new categories.
In real-world applications, a trained FG-SBIR model is often applied to both new categories and different human sketchers.
We introduce a novel model-agnostic meta-learning (MAML) based framework with several key modifications.
arXiv Detail & Related papers (2022-07-04T21:07:20Z) - Variational Distillation for Multi-View Learning [104.17551354374821]
We design several variational information bottlenecks to exploit two key characteristics for multi-view representation learning.
Under rigorously theoretical guarantee, our approach enables IB to grasp the intrinsic correlation between observations and semantic labels.
arXiv Detail & Related papers (2022-06-20T03:09:46Z) - Revisiting Consistency Regularization for Semi-Supervised Learning [80.28461584135967]
We propose an improved consistency regularization framework by a simple yet effective technique, FeatDistLoss.
Experimental results show that our model defines a new state of the art for various datasets and settings.
arXiv Detail & Related papers (2021-12-10T20:46:13Z) - Towards Calibrated Model for Long-Tailed Visual Recognition from Prior
Perspective [17.733087434470907]
Real-world data universally confronts a severe class-imbalance problem and exhibits a long-tailed distribution.
We propose two novel methods from the prior perspective to alleviate this dilemma.
First, we deduce a balance-oriented data augmentation named Uniform Mixup (UniMix) to promote mixup in long-tailed scenarios.
Second, motivated by the Bayesian theory, we figure out the Bayes Bias (Bayias) to compensate it as a modification on standard cross-entropy loss.
arXiv Detail & Related papers (2021-11-06T12:53:34Z) - Ortho-Shot: Low Displacement Rank Regularization with Data Augmentation
for Few-Shot Learning [23.465747123791772]
In few-shot classification, the primary goal is to learn representations that generalize well for novel classes.
We propose an efficient low displacement rank (LDR) regularization strategy termed Ortho-Shot.
arXiv Detail & Related papers (2021-10-18T14:58:36Z) - Understanding the Generalization of Adam in Learning Neural Networks
with Proper Regularization [118.50301177912381]
We show that Adam can converge to different solutions of the objective with provably different errors, even with weight decay globalization.
We show that if convex, and the weight decay regularization is employed, any optimization algorithms including Adam will converge to the same solution.
arXiv Detail & Related papers (2021-08-25T17:58:21Z) - Double Descent and Other Interpolation Phenomena in GANs [2.7007335372861974]
We study the generalization error as a function of latent space dimension in generative adversarial networks (GANs)
We develop a novel pseudo-supervised learning approach for GANs where the training utilizes pairs of fabricated (noise) inputs in conjunction with real output samples.
While our analysis focuses mostly on linear models, we also apply important insights for improving generalization of nonlinear, multilayer GANs.
arXiv Detail & Related papers (2021-06-07T23:07:57Z) - Learning Invariant Representations and Risks for Semi-supervised Domain
Adaptation [109.73983088432364]
We propose the first method that aims to simultaneously learn invariant representations and risks under the setting of semi-supervised domain adaptation (Semi-DA)
We introduce the LIRR algorithm for jointly textbfLearning textbfInvariant textbfRepresentations and textbfRisks.
arXiv Detail & Related papers (2020-10-09T15:42:35Z) - Unsupervised Domain Adaptation in Person re-ID via k-Reciprocal
Clustering and Large-Scale Heterogeneous Environment Synthesis [76.46004354572956]
We introduce an unsupervised domain adaptation approach for person re-identification.
Experimental results show that the proposed ktCUDA and SHRED approach achieves an average improvement of +5.7 mAP in re-identification performance.
arXiv Detail & Related papers (2020-01-14T17:43:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.