Dynamic Knowledge Distillation for Black-box Hypothesis Transfer
Learning
- URL: http://arxiv.org/abs/2007.12355v2
- Date: Fri, 7 Aug 2020 00:47:11 GMT
- Title: Dynamic Knowledge Distillation for Black-box Hypothesis Transfer
Learning
- Authors: Yiqin Yu, Xu Min, Shiwan Zhao, Jing Mei, Fei Wang, Dongsheng Li,
Kenney Ng, Shaochun Li
- Abstract summary: We introduce a novel algorithm called dynamic knowledge distillation for hypothesis transfer learning (dkdHTL)
In this method, we use knowledge distillation with instance-wise weighting mechanism to adaptively transfer the "dark" knowledge from the source hypothesis to the target domain.
Empirical results on both transfer learning benchmark datasets and a healthcare dataset demonstrate the effectiveness of our method.
- Score: 20.533564478224967
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In real world applications like healthcare, it is usually difficult to build
a machine learning prediction model that works universally well across
different institutions. At the same time, the available model is often
proprietary, i.e., neither the model parameter nor the data set used for model
training is accessible. In consequence, leveraging the knowledge hidden in the
available model (aka. the hypothesis) and adapting it to a local data set
becomes extremely challenging. Motivated by this situation, in this paper we
aim to address such a specific case within the hypothesis transfer learning
framework, in which 1) the source hypothesis is a black-box model and 2) the
source domain data is unavailable. In particular, we introduce a novel
algorithm called dynamic knowledge distillation for hypothesis transfer
learning (dkdHTL). In this method, we use knowledge distillation with
instance-wise weighting mechanism to adaptively transfer the "dark" knowledge
from the source hypothesis to the target domain.The weighting coefficients of
the distillation loss and the standard loss are determined by the consistency
between the predicted probability of the source hypothesis and the target
ground-truth label.Empirical results on both transfer learning benchmark
datasets and a healthcare dataset demonstrate the effectiveness of our method.
Related papers
- Demystifying amortized causal discovery with transformers [21.058343547918053]
Supervised learning approaches for causal discovery from observational data often achieve competitive performance.
In this work, we investigate CSIvA, a transformer-based model promising to train on synthetic data and transfer to real data.
We bridge the gap with existing identifiability theory and show that constraints on the training data distribution implicitly define a prior on the test observations.
arXiv Detail & Related papers (2024-05-27T08:17:49Z) - Cross-Domain Transfer Learning with CoRTe: Consistent and Reliable
Transfer from Black-Box to Lightweight Segmentation Model [25.3403116022412]
CoRTe is a pseudo-labelling function that extracts reliable knowledge from a black-box source model.
We benchmark CoRTe on two synthetic-to-real settings, demonstrating remarkable results when using black-box models to transfer knowledge on lightweight models for a target data distribution.
arXiv Detail & Related papers (2024-02-20T16:35:14Z) - Source-Free Unsupervised Domain Adaptation with Hypothesis Consolidation
of Prediction Rationale [53.152460508207184]
Source-Free Unsupervised Domain Adaptation (SFUDA) is a challenging task where a model needs to be adapted to a new domain without access to target domain labels or source domain data.
This paper proposes a novel approach that considers multiple prediction hypotheses for each sample and investigates the rationale behind each hypothesis.
To achieve the optimal performance, we propose a three-step adaptation process: model pre-adaptation, hypothesis consolidation, and semi-supervised learning.
arXiv Detail & Related papers (2024-02-02T05:53:22Z) - Estimate Deformation Capacity of Non-Ductile RC Shear Walls using
Explainable Boosting Machine [0.0]
This study aims to develop a fully explainable machine learning model to predict the deformation capacity of non-ductile reinforced concrete shear walls.
The proposed Explainable Boosting Machines (EBM)-based model is an interpretable, robust, naturally explainable glass-box model, yet provides high accuracy comparable to its black-box counterparts.
arXiv Detail & Related papers (2023-01-11T09:20:29Z) - Transfer Learning with Uncertainty Quantification: Random Effect
Calibration of Source to Target (RECaST) [1.8047694351309207]
We develop a statistical framework for model predictions based on transfer learning, called RECaST.
We mathematically and empirically demonstrate the validity of our RECaST approach for transfer learning between linear models.
We examine our method's performance in a simulation study and in an application to real hospital data.
arXiv Detail & Related papers (2022-11-29T19:39:47Z) - Principled Knowledge Extrapolation with GANs [92.62635018136476]
We study counterfactual synthesis from a new perspective of knowledge extrapolation.
We show that an adversarial game with a closed-form discriminator can be used to address the knowledge extrapolation problem.
Our method enjoys both elegant theoretical guarantees and superior performance in many scenarios.
arXiv Detail & Related papers (2022-05-21T08:39:42Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Knowledge-driven Active Learning [70.37119719069499]
Active learning strategies aim at minimizing the amount of labelled data required to train a Deep Learning model.
Most active strategies are based on uncertain sample selection, and even often restricted to samples lying close to the decision boundary.
Here we propose to take into consideration common domain-knowledge and enable non-expert users to train a model with fewer samples.
arXiv Detail & Related papers (2021-10-15T06:11:53Z) - X-model: Improving Data Efficiency in Deep Learning with A Minimax Model [78.55482897452417]
We aim at improving data efficiency for both classification and regression setups in deep learning.
To take the power of both worlds, we propose a novel X-model.
X-model plays a minimax game between the feature extractor and task-specific heads.
arXiv Detail & Related papers (2021-10-09T13:56:48Z) - Transferring model structure in Bayesian transfer learning for Gaussian
process regression [1.370633147306388]
This paper defines the task of conditioning a target probability distribution on a transferred source distribution.
Fully probabilistic design is adopted to solve this optimal decision-making problem in the target.
By successfully transferring higher moments of the source, the target can reject unreliable source knowledge.
arXiv Detail & Related papers (2021-01-18T05:28:02Z) - Do We Really Need to Access the Source Data? Source Hypothesis Transfer
for Unsupervised Domain Adaptation [102.67010690592011]
Unsupervised adaptationUDA (UDA) aims to leverage the knowledge learned from a labeled source dataset to solve similar tasks in a new unlabeled domain.
Prior UDA methods typically require to access the source data when learning to adapt the model.
This work tackles a practical setting where only a trained source model is available and how we can effectively utilize such a model without source data to solve UDA problems.
arXiv Detail & Related papers (2020-02-20T03:13:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.