Exploring the Design of Adaptation Protocols for Improved Generalization
and Machine Learning Safety
- URL: http://arxiv.org/abs/2207.12615v1
- Date: Tue, 26 Jul 2022 02:33:04 GMT
- Title: Exploring the Design of Adaptation Protocols for Improved Generalization
and Machine Learning Safety
- Authors: Puja Trivedi, Danai Koutra, Jayaraman J. Thiagarajan
- Abstract summary: We evaluate common adaptation protocols across distributions shifts and machine learning safety metrics.
We find that protocols induce disparate trade-offs that were not apparent from prior evaluation.
Using hardness-promoting augmentations during LP and then FT with augmentations may be particularly effective for trade-off mitigation.
- Score: 33.24980750651318
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While directly fine-tuning (FT) large-scale, pretrained models on
task-specific data is well-known to induce strong in-distribution task
performance, recent works have demonstrated that different adaptation
protocols, such as linear probing (LP) prior to FT, can improve
out-of-distribution generalization. However, the design space of such
adaptation protocols remains under-explored and the evaluation of such
protocols has primarily focused on distribution shifts. Therefore, in this
work, we evaluate common adaptation protocols across distributions shifts and
machine learning safety metrics (e.g., anomaly detection, calibration,
robustness to corruptions). We find that protocols induce disparate trade-offs
that were not apparent from prior evaluation. Further, we demonstrate that
appropriate pairing of data augmentation and protocol can substantially
mitigate this trade-off. Finally, we hypothesize and empirically see that using
hardness-promoting augmentations during LP and then FT with augmentations may
be particularly effective for trade-off mitigation.
Related papers
- Adapting Conformal Prediction to Distribution Shifts Without Labels [16.478151550456804]
Conformal prediction (CP) enables machine learning models to output prediction sets with guaranteed coverage rate.
Our goal is to improve the quality of CP-generated prediction sets using only unlabeled data from the test domain.
This is achieved by two new methods called ECP and EACP, that adjust the score function in CP according to the base model's uncertainty on the unlabeled test data.
arXiv Detail & Related papers (2024-06-03T15:16:02Z) - Efficient Conformal Prediction under Data Heterogeneity [79.35418041861327]
Conformal Prediction (CP) stands out as a robust framework for uncertainty quantification.
Existing approaches for tackling non-exchangeability lead to methods that are not computable beyond the simplest examples.
This work introduces a new efficient approach to CP that produces provably valid confidence sets for fairly general non-exchangeable data distributions.
arXiv Detail & Related papers (2023-12-25T20:02:51Z) - Conformal Prediction for Federated Uncertainty Quantification Under
Label Shift [57.54977668978613]
Federated Learning (FL) is a machine learning framework where many clients collaboratively train models.
We develop a new conformal prediction method based on quantile regression and take into account privacy constraints.
arXiv Detail & Related papers (2023-06-08T11:54:58Z) - Characterizing Out-of-Distribution Error via Optimal Transport [15.284665509194134]
Methods of predicting a model's performance on OOD data without labels are important for machine learning safety.
We introduce a novel method for estimating model performance by leveraging optimal transport theory.
We show that our approaches significantly outperform existing state-of-the-art methods with an up to 3x lower prediction error.
arXiv Detail & Related papers (2023-05-25T01:37:13Z) - A Closer Look at Model Adaptation using Feature Distortion and
Simplicity Bias [33.24980750651318]
We study the susceptibility of adaptation protocols to simplicity bias (SB)
SB has recently been shown to underlie several problems in robust generalization.
We propose modified linear probes that help mitigate SB.
arXiv Detail & Related papers (2023-03-23T17:57:09Z) - Improving Test-Time Adaptation via Shift-agnostic Weight Regularization
and Nearest Source Prototypes [18.140619966865955]
We propose a novel test-time adaptation strategy that adjusts the model pre-trained on the source domain using only unlabeled online data from the target domain.
We show that our method exhibits state-of-the-art performance on various standard benchmarks and even outperforms its supervised counterpart.
arXiv Detail & Related papers (2022-07-24T10:17:05Z) - Sample-Efficient Optimisation with Probabilistic Transformer Surrogates [66.98962321504085]
This paper investigates the feasibility of employing state-of-the-art probabilistic transformers in Bayesian optimisation.
We observe two drawbacks stemming from their training procedure and loss definition, hindering their direct deployment as proxies in black-box optimisation.
We introduce two components: 1) a BO-tailored training prior supporting non-uniformly distributed points, and 2) a novel approximate posterior regulariser trading-off accuracy and input sensitivity to filter favourable stationary points for improved predictive performance.
arXiv Detail & Related papers (2022-05-27T11:13:17Z) - Predicting Deep Neural Network Generalization with Perturbation Response
Curves [58.8755389068888]
We propose a new framework for evaluating the generalization capabilities of trained networks.
Specifically, we introduce two new measures for accurately predicting generalization gaps.
We attain better predictive scores than the current state-of-the-art measures on a majority of tasks in the Predicting Generalization in Deep Learning (PGDL) NeurIPS 2020 competition.
arXiv Detail & Related papers (2021-06-09T01:37:36Z) - Causally-motivated Shortcut Removal Using Auxiliary Labels [63.686580185674195]
Key challenge to learning such risk-invariant predictors is shortcut learning.
We propose a flexible, causally-motivated approach to address this challenge.
We show both theoretically and empirically that this causally-motivated regularization scheme yields robust predictors.
arXiv Detail & Related papers (2021-05-13T16:58:45Z) - Detached Error Feedback for Distributed SGD with Random Sparsification [98.98236187442258]
Communication bottleneck has been a critical problem in large-scale deep learning.
We propose a new distributed error feedback (DEF) algorithm, which shows better convergence than error feedback for non-efficient distributed problems.
We also propose DEFA to accelerate the generalization of DEF, which shows better bounds than DEF.
arXiv Detail & Related papers (2020-04-11T03:50:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.