Model Compression for Domain Adaptation through Causal Effect Estimation
- URL: http://arxiv.org/abs/2101.07086v1
- Date: Mon, 18 Jan 2021 14:18:02 GMT
- Title: Model Compression for Domain Adaptation through Causal Effect Estimation
- Authors: Guy Rotman, Amir Feder and Roi Reichart
- Abstract summary: ATE-guided Model Compression scheme (AMoC) generates many model candidates, differing by the model components that were removed.
Then, we select the best candidate through a stepwise regression model that utilizes the ATE to predict the expected performance on the target domain.
AMoC outperforms strong baselines on 46 of 60 domain pairs across two text classification tasks, with an average improvement of more than 3% in F1 above the strongest baseline.
- Score: 20.842938440720303
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent improvements in the predictive quality of natural language processing
systems are often dependent on a substantial increase in the number of model
parameters. This has led to various attempts of compressing such models, but
existing methods have not considered the differences in the predictive power of
various model components or in the generalizability of the compressed models.
To understand the connection between model compression and out-of-distribution
generalization, we define the task of compressing language representation
models such that they perform best in a domain adaptation setting. We choose to
address this problem from a causal perspective, attempting to estimate the
\textit{average treatment effect} (ATE) of a model component, such as a single
layer, on the model's predictions. Our proposed ATE-guided Model Compression
scheme (AMoC), generates many model candidates, differing by the model
components that were removed. Then, we select the best candidate through a
stepwise regression model that utilizes the ATE to predict the expected
performance on the target domain. AMoC outperforms strong baselines on 46 of 60
domain pairs across two text classification tasks, with an average improvement
of more than 3\% in F1 above the strongest baseline.
Related papers
- On conditional diffusion models for PDE simulations [53.01911265639582]
We study score-based diffusion models for forecasting and assimilation of sparse observations.
We propose an autoregressive sampling approach that significantly improves performance in forecasting.
We also propose a new training strategy for conditional score-based models that achieves stable performance over a range of history lengths.
arXiv Detail & Related papers (2024-10-21T18:31:04Z) - Continuous Language Model Interpolation for Dynamic and Controllable Text Generation [7.535219325248997]
We focus on the challenging case where the model must dynamically adapt to diverse -- and often changing -- user preferences.
We leverage adaptation methods based on linear weight, casting them as continuous multi-domain interpolators.
We show that varying the weights yields predictable and consistent change in the model outputs.
arXiv Detail & Related papers (2024-04-10T15:55:07Z) - Investigating Ensemble Methods for Model Robustness Improvement of Text
Classifiers [66.36045164286854]
We analyze a set of existing bias features and demonstrate there is no single model that works best for all the cases.
By choosing an appropriate bias model, we can obtain a better robustness result than baselines with a more sophisticated model design.
arXiv Detail & Related papers (2022-10-28T17:52:10Z) - On the Generalization and Adaption Performance of Causal Models [99.64022680811281]
Differentiable causal discovery has proposed to factorize the data generating process into a set of modules.
We study the generalization and adaption performance of such modular neural causal models.
Our analysis shows that the modular neural causal models outperform other models on both zero and few-shot adaptation in low data regimes.
arXiv Detail & Related papers (2022-06-09T17:12:32Z) - Model soups: averaging weights of multiple fine-tuned models improves
accuracy without increasing inference time [69.7693300927423]
We show that averaging the weights of multiple models fine-tuned with different hyper parameter configurations improves accuracy and robustness.
We show that the model soup approach extends to multiple image classification and natural language processing tasks.
arXiv Detail & Related papers (2022-03-10T17:03:49Z) - Model Compression for Dynamic Forecast Combination [9.281199058905017]
We show that compressing dynamic forecasting ensembles into an individual model leads to a comparable predictive performance.
We also show that the compressed individual model with best average rank is a rule-based regression model.
arXiv Detail & Related papers (2021-04-05T09:55:35Z) - Selecting Treatment Effects Models for Domain Adaptation Using Causal
Knowledge [82.5462771088607]
We propose a novel model selection metric specifically designed for ITE methods under the unsupervised domain adaptation setting.
In particular, we propose selecting models whose predictions of interventions' effects satisfy known causal structures in the target domain.
arXiv Detail & Related papers (2021-02-11T21:03:14Z) - Improving the Reconstruction of Disentangled Representation Learners via Multi-Stage Modeling [54.94763543386523]
Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the ( aggregate) posterior to encourage statistical independence of the latent factors.
We present a novel multi-stage modeling approach where the disentangled factors are first learned using a penalty-based disentangled representation learning method.
Then, the low-quality reconstruction is improved with another deep generative model that is trained to model the missing correlated latent variables.
arXiv Detail & Related papers (2020-10-25T18:51:15Z) - Semi-nonparametric Latent Class Choice Model with a Flexible Class
Membership Component: A Mixture Model Approach [6.509758931804479]
The proposed model formulates the latent classes using mixture models as an alternative approach to the traditional random utility specification.
Results show that mixture models improve the overall performance of latent class choice models.
arXiv Detail & Related papers (2020-07-06T13:19:26Z) - Pattern Similarity-based Machine Learning Methods for Mid-term Load
Forecasting: A Comparative Study [0.0]
We use pattern similarity-based methods for forecasting monthly electricity demand expressing annual seasonality.
An integral part of the models is the time series representation using patterns of time series sequences.
We consider four such models: nearest neighbor model, fuzzy neighborhood model, kernel regression model and general regression neural network.
arXiv Detail & Related papers (2020-03-03T12:14:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.