Related papers: Domain Generalization via Balancing Training Difficulty and Model Capability

Domain Generalization via Balancing Training Difficulty and Model Capability

URL: http://arxiv.org/abs/2309.00844v1
Date: Sat, 2 Sep 2023 07:09:23 GMT
Title: Domain Generalization via Balancing Training Difficulty and Model Capability
Authors: Xueying Jiang, Jiaxing Huang, Sheng Jin, Shijian Lu
Abstract summary: Domain generalization (DG) aims to learn domain-generalizable models from one or multiple source domains that can perform well in unseen target domains. Despite its recent progress, most existing work suffers from the misalignment between the difficulty level of training samples and the capability of contemporarily trained models. We design MoDify, a Momentum Difficulty framework that tackles the misalignment by balancing the seesaw between the model's capability and the samples' difficulties.
Score: 61.053202176230904
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Domain generalization (DG) aims to learn domain-generalizable models from one or multiple source domains that can perform well in unseen target domains. Despite its recent progress, most existing work suffers from the misalignment between the difficulty level of training samples and the capability of contemporarily trained models, leading to over-fitting or under-fitting in the trained generalization model. We design MoDify, a Momentum Difficulty framework that tackles the misalignment by balancing the seesaw between the model's capability and the samples' difficulties along the training process. MoDify consists of two novel designs that collaborate to fight against the misalignment while learning domain-generalizable models. The first is MoDify-based Data Augmentation which exploits an RGB Shuffle technique to generate difficulty-aware training samples on the fly. The second is MoDify-based Network Optimization which dynamically schedules the training samples for balanced and smooth learning with appropriate difficulty. Without bells and whistles, a simple implementation of MoDify achieves superior performance across multiple benchmarks. In addition, MoDify can complement existing methods as a plug-in, and it is generic and can work for different visual recognition tasks.

Related papers

ToReMi: Topic-Aware Data Reweighting for Dynamic Pre-Training Data Selection [28.75333303894706]
ToReMi is a novel framework that adjusts training sample weights according to their topical associations and observed learning patterns. Our experiments reveal that ToReMi variants consistently achieve superior performance over conventional pre-training approaches.
arXiv Detail & Related papers (2025-04-01T12:06:42Z)
From Dense to Dynamic: Token-Difficulty Driven MoEfication of Pre-Trained LLMs [37.50902921493273]
Training large language models (LLMs) for different inference constraints is computationally expensive. DynaMoE adapts a pre-trained dense LLM to a token-difficulty-driven Mixture-of-Experts model with minimal fine-tuning cost. Our method achieves similar aggregated accuracy across downstream tasks, despite using only $frac19textth$ of their fine-tuning cost.
arXiv Detail & Related papers (2025-02-17T21:12:57Z)
Attention Is All You Need For Mixture-of-Depths Routing [5.419910566904439]
We introduce a novel attention-based routing mechanism A-MoD. A-MoD allows for more efficient training as it introduces no additional trainable parameters. It can increase the performance of the MoD model.
arXiv Detail & Related papers (2024-12-30T11:25:54Z)
Specialized Foundation Models Struggle to Beat Supervised Baselines [60.23386520331143]
We look at three modalities -- genomics, satellite imaging, and time series -- with multiple recent FMs and compare them to a standard supervised learning workflow. We find that it is consistently possible to train simple supervised models that match or even outperform the latest foundation models.
arXiv Detail & Related papers (2024-11-05T04:10:59Z)
Truncated Consistency Models [57.50243901368328]
Training consistency models requires learning to map all intermediate points along PF ODE trajectories to their corresponding endpoints. We empirically find that this training paradigm limits the one-step generation performance of consistency models. We propose a new parameterization of the consistency function and a two-stage training procedure that prevents the truncated-time training from collapsing to a trivial solution.
arXiv Detail & Related papers (2024-10-18T22:38:08Z)
Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts [20.202031878825153]
We propose a novel dynamic data mixture for MoE instruction tuning. Inspired by MoE's token routing preference, we build dataset-level representations and then capture the subtle differences among datasets. Results on two MoE models demonstrate the effectiveness of our approach on both downstream knowledge & reasoning tasks and open-ended queries.
arXiv Detail & Related papers (2024-06-17T06:47:03Z)
Multimodal Federated Learning with Missing Modality via Prototype Mask and Contrast [23.936677199734213]
In this paper, we introduce a prototype library into the FedAvg-based Federated Learning framework. The proposed method utilizes prototypes as masks representing missing modalities to formulate a task-calibrated training loss and a model-agnostic uni-modality inference strategy. Compared to the baselines, our method improved inference accuracy by 3.7% with 50% modality missing during training and by 23.8% during uni-modality inference.
arXiv Detail & Related papers (2023-12-21T00:55:12Z)
Adaptive Training Meets Progressive Scaling: Elevating Efficiency in Diffusion Models [52.1809084559048]
We propose a novel two-stage divide-and-conquer training strategy termed TDC Training. It groups timesteps based on task similarity and difficulty, assigning highly customized denoising models to each group, thereby enhancing the performance of diffusion models. While two-stage training avoids the need to train each model separately, the total training cost is even lower than training a single unified denoising model.
arXiv Detail & Related papers (2023-12-20T03:32:58Z)
AdaMerging: Adaptive Model Merging for Multi-Task Learning [68.75885518081357]
This paper introduces an innovative technique called Adaptive Model Merging (AdaMerging) It aims to autonomously learn the coefficients for model merging, either in a task-wise or layer-wise manner, without relying on the original training data. Compared to the current state-of-the-art task arithmetic merging scheme, AdaMerging showcases a remarkable 11% improvement in performance.
arXiv Detail & Related papers (2023-10-04T04:26:33Z)
Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs [63.936622239286685]
We find that interference among different tasks and modalities is the main factor to this phenomenon. We introduce the Conditional Mixture-of-Experts (Conditional MoEs) to generalist models. Code and pre-trained generalist models shall be released.
arXiv Detail & Related papers (2022-06-09T17:59:59Z)
Self-Damaging Contrastive Learning [92.34124578823977]
Unlabeled data in reality is commonly imbalanced and shows a long-tail distribution. This paper proposes a principled framework called Self-Damaging Contrastive Learning to automatically balance the representation learning without knowing the classes. Our experiments show that SDCLR significantly improves not only overall accuracies but also balancedness.
arXiv Detail & Related papers (2021-06-06T00:04:49Z)
Cross-Domain Few-Shot Classification via Adversarial Task Augmentation [16.112554109446204]
Few-shot classification aims to recognize unseen classes with few labeled samples from each class. Many meta-learning models for few-shot classification elaborately design various task-shared inductive bias (meta-knowledge) to solve such tasks. In this work, we aim to improve the robustness of the inductive bias through task augmentation.
arXiv Detail & Related papers (2021-04-29T14:51:53Z)
Adversarial Style Mining for One-Shot Unsupervised Domain Adaptation [43.351728923472464]
One-Shot Unsupervised Domain Adaptation assumes that only one unlabeled target sample can be available when learning to adapt. Traditional adaptation approaches are prone to failure due to the scarce of unlabeled target data. We propose a novel Adrial Style Mining approach, which combines the style transfer module and task-specific module into an adversarial manner.
arXiv Detail & Related papers (2020-04-13T16:18:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.