Instance-aware Model Ensemble With Distillation For Unsupervised Domain
Adaptation
- URL: http://arxiv.org/abs/2211.08106v1
- Date: Tue, 15 Nov 2022 12:53:23 GMT
- Title: Instance-aware Model Ensemble With Distillation For Unsupervised Domain
Adaptation
- Authors: Weimin Wu, Jiayuan Fan, Tao Chen, Hancheng Ye, Bo Zhang, Baopu Li
- Abstract summary: We propose a novel framework, namely Instance aware Model Ensemble With Distillation, IMED.
IMED fuses multiple UDA component models adaptively according to different instances and distills these components into a small model.
We show the superiority of the model based on IMED to the state of the art methods under the comparable computation cost.
- Score: 28.79286984013436
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The linear ensemble based strategy, i.e., averaging ensemble, has been
proposed to improve the performance in unsupervised domain adaptation tasks.
However, a typical UDA task is usually challenged by dynamically changing
factors, such as variable weather, views, and background in the unlabeled
target domain. Most previous ensemble strategies ignore UDA's dynamic and
uncontrollable challenge, facing limited feature representations and
performance bottlenecks. To enhance the model, adaptability between domains and
reduce the computational cost when deploying the ensemble model, we propose a
novel framework, namely Instance aware Model Ensemble With Distillation, IMED,
which fuses multiple UDA component models adaptively according to different
instances and distills these components into a small model. The core idea of
IMED is a dynamic instance aware ensemble strategy, where for each instance, a
nonlinear fusion subnetwork is learned that fuses the extracted features and
predicted labels of multiple component models. The nonlinear fusion method can
help the ensemble model handle dynamically changing factors. After learning a
large capacity ensemble model with good adaptability to different changing
factors, we leverage the ensemble teacher model to guide the learning of a
compact student model by knowledge distillation. Furthermore, we provide the
theoretical analysis of the validity of IMED for UDA. Extensive experiments
conducted on various UDA benchmark datasets, e.g., Office 31, Office Home, and
VisDA 2017, show the superiority of the model based on IMED to the state of the
art methods under the comparable computation cost.
Related papers
- Efficient and Effective Weight-Ensembling Mixture of Experts for Multi-Task Model Merging [111.8456671452411]
Multi-task learning (MTL) leverages a shared model to accomplish multiple tasks and facilitate knowledge transfer.
We propose a Weight-Ensembling Mixture of Experts (WEMoE) method for multi-task model merging.
We show that WEMoE and E-WEMoE outperform state-of-the-art (SOTA) model merging methods in terms of MTL performance, generalization, and robustness.
arXiv Detail & Related papers (2024-10-29T07:16:31Z) - MITA: Bridging the Gap between Model and Data for Test-time Adaptation [68.62509948690698]
Test-Time Adaptation (TTA) has emerged as a promising paradigm for enhancing the generalizability of models.
We propose Meet-In-The-Middle based MITA, which introduces energy-based optimization to encourage mutual adaptation of the model and data from opposing directions.
arXiv Detail & Related papers (2024-10-12T07:02:33Z) - Layer-wise Model Merging for Unsupervised Domain Adaptation in Segmentation Tasks [3.776249047528669]
We leverage the abundance of freely trained models to introduce a cost-free approach to model merging.
It aims to maintain the distinctiveness of the task-specific final layers while unifying the initial layers.
This approach ensures parameter consistency across all layers, essential for boosting performance.
arXiv Detail & Related papers (2024-09-24T07:19:30Z) - Achieving Byzantine-Resilient Federated Learning via Layer-Adaptive Sparsified Model Aggregation [7.200910949076064]
Federated Learning (FL) enables multiple clients to collaboratively train a model without sharing their local data.
Yet the FL system is vulnerable to well-designed Byzantine attacks, which aim to disrupt the model training process by uploading malicious model updates.
We propose the Layer-Adaptive Sparsified Model Aggregation (LASA) approach, which combines pre-aggregation sparsification with layer-wise adaptive aggregation to improve robustness.
arXiv Detail & Related papers (2024-09-02T19:28:35Z) - Out-of-Distribution Detection via Deep Multi-Comprehension Ensemble [11.542472900306745]
Multi-Comprehension (MC) Ensemble is proposed as a strategy to augment the Out-of-Distribution (OOD) feature representation field.
Our experimental results demonstrate the superior performance of the MC Ensemble strategy in OOD detection.
This underscores the effectiveness of our proposed approach in enhancing the model's capability to detect instances outside its training distribution.
arXiv Detail & Related papers (2024-03-24T18:43:04Z) - Reinforcement Learning for Topic Models [3.42658286826597]
We apply reinforcement learning techniques to topic modeling by replacing the variational autoencoder in ProdLDA with a continuous action space reinforcement learning policy.
We introduce several modifications: modernize the neural network architecture, weight the ELBO loss, use contextual embeddings, and monitor the learning process via computing topic diversity and coherence.
arXiv Detail & Related papers (2023-05-08T16:41:08Z) - Parameter-efficient Modularised Bias Mitigation via AdapterFusion [22.424110883305243]
We propose a novel approach to develop stand-alone debiasing functionalities separate from the model.
We introduce DAM - a debiasing approach to first encapsulate arbitrary bias mitigation functionalities into separate adapters, and then add them to the model on-demand.
Our results show that DAM improves or maintains the effectiveness of bias mitigation, avoids forgetting in a multi-attribute scenario, and maintains on-par task performance.
arXiv Detail & Related papers (2023-02-13T12:39:45Z) - Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models.
This creates a barrier to fusing knowledge across individual models to yield a better single model.
We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z) - Instance-specific and Model-adaptive Supervision for Semi-supervised
Semantic Segmentation [49.82432158155329]
We propose an instance-specific and model-adaptive supervision for semi-supervised semantic segmentation, named iMAS.
iMAS learns from unlabeled instances progressively by weighing their corresponding consistency losses based on the evaluated hardness.
arXiv Detail & Related papers (2022-11-21T10:37:28Z) - Dynamic Latent Separation for Deep Learning [67.62190501599176]
A core problem in machine learning is to learn expressive latent variables for model prediction on complex data.
Here, we develop an approach that improves expressiveness, provides partial interpretation, and is not restricted to specific applications.
arXiv Detail & Related papers (2022-10-07T17:56:53Z) - Model-Invariant State Abstractions for Model-Based Reinforcement
Learning [54.616645151708994]
We introduce a new type of state abstraction called textitmodel-invariance.
This allows for generalization to novel combinations of unseen values of state variables.
We prove that an optimal policy can be learned over this model-invariance state abstraction.
arXiv Detail & Related papers (2021-02-19T10:37:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.