GeneralizeFormer: Layer-Adaptive Model Generation across Test-Time Distribution Shifts
- URL: http://arxiv.org/abs/2502.12195v1
- Date: Sat, 15 Feb 2025 10:10:49 GMT
- Title: GeneralizeFormer: Layer-Adaptive Model Generation across Test-Time Distribution Shifts
- Authors: Sameer Ambekar, Zehao Xiao, Xiantong Zhen, Cees G. M. Snoek,
- Abstract summary: We consider the problem of test-time domain generalization, where a model is trained on several source domains and adjusted on target domains never seen during training.
We propose to generate multiple layer parameters on the fly during inference by a lightweight meta-learned transformer, which we call textitGeneralizeFormer.
- Score: 58.95913531746308
- License:
- Abstract: We consider the problem of test-time domain generalization, where a model is trained on several source domains and adjusted on target domains never seen during training. Different from the common methods that fine-tune the model or adjust the classifier parameters online, we propose to generate multiple layer parameters on the fly during inference by a lightweight meta-learned transformer, which we call \textit{GeneralizeFormer}. The layer-wise parameters are generated per target batch without fine-tuning or online adjustment. By doing so, our method is more effective in dynamic scenarios with multiple target distributions and also avoids forgetting valuable source distribution characteristics. Moreover, by considering layer-wise gradients, the proposed method adapts itself to various distribution shifts. To reduce the computational and time cost, we fix the convolutional parameters while only generating parameters of the Batch Normalization layers and the linear classifier. Experiments on six widely used domain generalization datasets demonstrate the benefits and abilities of the proposed method to efficiently handle various distribution shifts, generalize in dynamic scenarios, and avoid forgetting.
Related papers
- pFedGPA: Diffusion-based Generative Parameter Aggregation for Personalized Federated Learning [23.43592558078981]
Federated Learning (FL) offers a decentralized approach to model training, where data remains local and only model parameters are shared between the clients and the central server.
Traditional methods, such as Federated Averaging (FedAvg), linearly aggregate these parameters which are usually trained on heterogeneous data distributions.
We propose a novel generative parameter aggregation framework for personalized FL, textttpFedGPA.
arXiv Detail & Related papers (2024-09-09T15:13:56Z) - Open Domain Generalization with a Single Network by Regularization
Exploiting Pre-trained Features [37.518025833882334]
Open Domain Generalization (ODG) is a challenging task as it deals with distribution shifts and category shifts.
Previous work has used multiple source-specific networks, which involve a high cost.
This paper proposes a method that can handle ODG using only a single network.
arXiv Detail & Related papers (2023-12-08T16:22:10Z) - Winning Prize Comes from Losing Tickets: Improve Invariant Learning by
Exploring Variant Parameters for Out-of-Distribution Generalization [76.27711056914168]
Out-of-Distribution (OOD) Generalization aims to learn robust models that generalize well to various environments without fitting to distribution-specific features.
Recent studies based on Lottery Ticket Hypothesis (LTH) address this problem by minimizing the learning target to find some of the parameters that are critical to the task.
We propose Exploring Variant parameters for Invariant Learning (EVIL) which also leverages the distribution knowledge to find the parameters that are sensitive to distribution shift.
arXiv Detail & Related papers (2023-10-25T06:10:57Z) - Delta-AI: Local objectives for amortized inference in sparse graphical models [64.5938437823851]
We present a new algorithm for amortized inference in sparse probabilistic graphical models (PGMs)
Our approach is based on the observation that when the sampling of variables in a PGM is seen as a sequence of actions taken by an agent, sparsity of the PGM enables local credit assignment in the agent's policy learning objective.
We illustrate $Delta$-AI's effectiveness for sampling from synthetic PGMs and training latent variable models with sparse factor structure.
arXiv Detail & Related papers (2023-10-03T20:37:03Z) - Improving Out-of-Distribution Robustness of Classifiers via Generative
Interpolation [56.620403243640396]
Deep neural networks achieve superior performance for learning from independent and identically distributed (i.i.d.) data.
However, their performance deteriorates significantly when handling out-of-distribution (OoD) data.
We develop a simple yet effective method called Generative Interpolation to fuse generative models trained from multiple domains for synthesizing diverse OoD samples.
arXiv Detail & Related papers (2023-07-23T03:53:53Z) - Style Interleaved Learning for Generalizable Person Re-identification [69.03539634477637]
We propose a novel style interleaved learning (IL) framework for DG ReID training.
Unlike conventional learning strategies, IL incorporates two forward propagations and one backward propagation for each iteration.
We show that our model consistently outperforms state-of-the-art methods on large-scale benchmarks for DG ReID.
arXiv Detail & Related papers (2022-07-07T07:41:32Z) - A Prototype-Oriented Framework for Unsupervised Domain Adaptation [52.25537670028037]
We provide a memory and computation-efficient probabilistic framework to extract class prototypes and align the target features with them.
We demonstrate the general applicability of our method on a wide range of scenarios, including single-source, multi-source, class-imbalance, and source-private domain adaptation.
arXiv Detail & Related papers (2021-10-22T19:23:22Z) - Robust Federated Learning Through Representation Matching and Adaptive
Hyper-parameters [5.319361976450981]
Federated learning is a distributed, privacy-aware learning scenario which trains a single model on data belonging to several clients.
Current federated learning methods struggle in cases with heterogeneous client-side data distributions.
We propose a novel representation matching scheme that reduces the divergence of local models.
arXiv Detail & Related papers (2019-12-30T20:19:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.