Domain Generalization via Balancing Training Difficulty and Model
Capability
- URL: http://arxiv.org/abs/2309.00844v1
- Date: Sat, 2 Sep 2023 07:09:23 GMT
- Title: Domain Generalization via Balancing Training Difficulty and Model
Capability
- Authors: Xueying Jiang, Jiaxing Huang, Sheng Jin, Shijian Lu
- Abstract summary: Domain generalization (DG) aims to learn domain-generalizable models from one or multiple source domains that can perform well in unseen target domains.
Despite its recent progress, most existing work suffers from the misalignment between the difficulty level of training samples and the capability of contemporarily trained models.
We design MoDify, a Momentum Difficulty framework that tackles the misalignment by balancing the seesaw between the model's capability and the samples' difficulties.
- Score: 61.053202176230904
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Domain generalization (DG) aims to learn domain-generalizable models from one
or multiple source domains that can perform well in unseen target domains.
Despite its recent progress, most existing work suffers from the misalignment
between the difficulty level of training samples and the capability of
contemporarily trained models, leading to over-fitting or under-fitting in the
trained generalization model. We design MoDify, a Momentum Difficulty framework
that tackles the misalignment by balancing the seesaw between the model's
capability and the samples' difficulties along the training process. MoDify
consists of two novel designs that collaborate to fight against the
misalignment while learning domain-generalizable models. The first is
MoDify-based Data Augmentation which exploits an RGB Shuffle technique to
generate difficulty-aware training samples on the fly. The second is
MoDify-based Network Optimization which dynamically schedules the training
samples for balanced and smooth learning with appropriate difficulty. Without
bells and whistles, a simple implementation of MoDify achieves superior
performance across multiple benchmarks. In addition, MoDify can complement
existing methods as a plug-in, and it is generic and can work for different
visual recognition tasks.
Related papers
- Specialized Foundation Models Struggle to Beat Supervised Baselines [60.23386520331143]
We look at three modalities -- genomics, satellite imaging, and time series -- with multiple recent FMs and compare them to a standard supervised learning workflow.
We find that it is consistently possible to train simple supervised models that match or even outperform the latest foundation models.
arXiv Detail & Related papers (2024-11-05T04:10:59Z) - Truncated Consistency Models [57.50243901368328]
Training consistency models requires learning to map all intermediate points along PF ODE trajectories to their corresponding endpoints.
We empirically find that this training paradigm limits the one-step generation performance of consistency models.
We propose a new parameterization of the consistency function and a two-stage training procedure that prevents the truncated-time training from collapsing to a trivial solution.
arXiv Detail & Related papers (2024-10-18T22:38:08Z) - Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts [20.202031878825153]
We propose a novel dynamic data mixture for MoE instruction tuning.
Inspired by MoE's token routing preference, we build dataset-level representations and then capture the subtle differences among datasets.
Results on two MoE models demonstrate the effectiveness of our approach on both downstream knowledge & reasoning tasks and open-ended queries.
arXiv Detail & Related papers (2024-06-17T06:47:03Z) - Multimodal Federated Learning with Missing Modality via Prototype Mask
and Contrast [23.936677199734213]
In this paper, we introduce a prototype library into the FedAvg-based Federated Learning framework.
The proposed method utilizes prototypes as masks representing missing modalities to formulate a task-calibrated training loss and a model-agnostic uni-modality inference strategy.
Compared to the baselines, our method improved inference accuracy by 3.7% with 50% modality missing during training and by 23.8% during uni-modality inference.
arXiv Detail & Related papers (2023-12-21T00:55:12Z) - AdaMerging: Adaptive Model Merging for Multi-Task Learning [68.75885518081357]
This paper introduces an innovative technique called Adaptive Model Merging (AdaMerging)
It aims to autonomously learn the coefficients for model merging, either in a task-wise or layer-wise manner, without relying on the original training data.
Compared to the current state-of-the-art task arithmetic merging scheme, AdaMerging showcases a remarkable 11% improvement in performance.
arXiv Detail & Related papers (2023-10-04T04:26:33Z) - Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional
MoEs [63.936622239286685]
We find that interference among different tasks and modalities is the main factor to this phenomenon.
We introduce the Conditional Mixture-of-Experts (Conditional MoEs) to generalist models.
Code and pre-trained generalist models shall be released.
arXiv Detail & Related papers (2022-06-09T17:59:59Z) - Self-Damaging Contrastive Learning [92.34124578823977]
Unlabeled data in reality is commonly imbalanced and shows a long-tail distribution.
This paper proposes a principled framework called Self-Damaging Contrastive Learning to automatically balance the representation learning without knowing the classes.
Our experiments show that SDCLR significantly improves not only overall accuracies but also balancedness.
arXiv Detail & Related papers (2021-06-06T00:04:49Z) - Cross-Domain Few-Shot Classification via Adversarial Task Augmentation [16.112554109446204]
Few-shot classification aims to recognize unseen classes with few labeled samples from each class.
Many meta-learning models for few-shot classification elaborately design various task-shared inductive bias (meta-knowledge) to solve such tasks.
In this work, we aim to improve the robustness of the inductive bias through task augmentation.
arXiv Detail & Related papers (2021-04-29T14:51:53Z) - Adversarial Style Mining for One-Shot Unsupervised Domain Adaptation [43.351728923472464]
One-Shot Unsupervised Domain Adaptation assumes that only one unlabeled target sample can be available when learning to adapt.
Traditional adaptation approaches are prone to failure due to the scarce of unlabeled target data.
We propose a novel Adrial Style Mining approach, which combines the style transfer module and task-specific module into an adversarial manner.
arXiv Detail & Related papers (2020-04-13T16:18:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.