Related papers: Exploring the Design of Adaptation Protocols for Improved Generalization and Machine Learning Safety

Exploring the Design of Adaptation Protocols for Improved Generalization and Machine Learning Safety

URL: http://arxiv.org/abs/2207.12615v1
Date: Tue, 26 Jul 2022 02:33:04 GMT
Title: Exploring the Design of Adaptation Protocols for Improved Generalization and Machine Learning Safety
Authors: Puja Trivedi, Danai Koutra, Jayaraman J. Thiagarajan
Abstract summary: We evaluate common adaptation protocols across distributions shifts and machine learning safety metrics. We find that protocols induce disparate trade-offs that were not apparent from prior evaluation. Using hardness-promoting augmentations during LP and then FT with augmentations may be particularly effective for trade-off mitigation.
Score: 33.24980750651318
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While directly fine-tuning (FT) large-scale, pretrained models on task-specific data is well-known to induce strong in-distribution task performance, recent works have demonstrated that different adaptation protocols, such as linear probing (LP) prior to FT, can improve out-of-distribution generalization. However, the design space of such adaptation protocols remains under-explored and the evaluation of such protocols has primarily focused on distribution shifts. Therefore, in this work, we evaluate common adaptation protocols across distributions shifts and machine learning safety metrics (e.g., anomaly detection, calibration, robustness to corruptions). We find that protocols induce disparate trade-offs that were not apparent from prior evaluation. Further, we demonstrate that appropriate pairing of data augmentation and protocol can substantially mitigate this trade-off. Finally, we hypothesize and empirically see that using hardness-promoting augmentations during LP and then FT with augmentations may be particularly effective for trade-off mitigation.

Related papers

Adapting Conformal Prediction to Distribution Shifts Without Labels [16.478151550456804]
Conformal prediction (CP) enables machine learning models to output prediction sets with guaranteed coverage rate. Our goal is to improve the quality of CP-generated prediction sets using only unlabeled data from the test domain. This is achieved by two new methods called ECP and EACP, that adjust the score function in CP according to the base model's uncertainty on the unlabeled test data.
arXiv Detail & Related papers (2024-06-03T15:16:02Z)
Efficient Conformal Prediction under Data Heterogeneity [79.35418041861327]
Conformal Prediction (CP) stands out as a robust framework for uncertainty quantification. Existing approaches for tackling non-exchangeability lead to methods that are not computable beyond the simplest examples. This work introduces a new efficient approach to CP that produces provably valid confidence sets for fairly general non-exchangeable data distributions.
arXiv Detail & Related papers (2023-12-25T20:02:51Z)
Boosted Control Functions: Distribution generalization and invariance in confounded models [10.503777692702952]
We introduce a strong notion of invariance that allows for distribution generalization even in the presence of nonlinear, non-identifiable structural functions. We propose the ControlTwicing algorithm to estimate the Boosted Control Function (BCF) using flexible machine-learning techniques.
arXiv Detail & Related papers (2023-10-09T15:43:46Z)
Conformal Prediction for Federated Uncertainty Quantification Under Label Shift [57.54977668978613]
Federated Learning (FL) is a machine learning framework where many clients collaboratively train models. We develop a new conformal prediction method based on quantile regression and take into account privacy constraints.
arXiv Detail & Related papers (2023-06-08T11:54:58Z)
Characterizing Out-of-Distribution Error via Optimal Transport [15.284665509194134]
Methods of predicting a model's performance on OOD data without labels are important for machine learning safety. We introduce a novel method for estimating model performance by leveraging optimal transport theory. We show that our approaches significantly outperform existing state-of-the-art methods with an up to 3x lower prediction error.
arXiv Detail & Related papers (2023-05-25T01:37:13Z)
A Closer Look at Model Adaptation using Feature Distortion and Simplicity Bias [33.24980750651318]
We study the susceptibility of adaptation protocols to simplicity bias (SB) SB has recently been shown to underlie several problems in robust generalization. We propose modified linear probes that help mitigate SB.
arXiv Detail & Related papers (2023-03-23T17:57:09Z)
Improving Test-Time Adaptation via Shift-agnostic Weight Regularization and Nearest Source Prototypes [18.140619966865955]
We propose a novel test-time adaptation strategy that adjusts the model pre-trained on the source domain using only unlabeled online data from the target domain. We show that our method exhibits state-of-the-art performance on various standard benchmarks and even outperforms its supervised counterpart.
arXiv Detail & Related papers (2022-07-24T10:17:05Z)
Sample-Efficient Optimisation with Probabilistic Transformer Surrogates [66.98962321504085]
This paper investigates the feasibility of employing state-of-the-art probabilistic transformers in Bayesian optimisation. We observe two drawbacks stemming from their training procedure and loss definition, hindering their direct deployment as proxies in black-box optimisation. We introduce two components: 1) a BO-tailored training prior supporting non-uniformly distributed points, and 2) a novel approximate posterior regulariser trading-off accuracy and input sensitivity to filter favourable stationary points for improved predictive performance.
arXiv Detail & Related papers (2022-05-27T11:13:17Z)
Predicting Deep Neural Network Generalization with Perturbation Response Curves [58.8755389068888]
We propose a new framework for evaluating the generalization capabilities of trained networks. Specifically, we introduce two new measures for accurately predicting generalization gaps. We attain better predictive scores than the current state-of-the-art measures on a majority of tasks in the Predicting Generalization in Deep Learning (PGDL) NeurIPS 2020 competition.
arXiv Detail & Related papers (2021-06-09T01:37:36Z)
Causally-motivated Shortcut Removal Using Auxiliary Labels [63.686580185674195]
Key challenge to learning such risk-invariant predictors is shortcut learning. We propose a flexible, causally-motivated approach to address this challenge. We show both theoretically and empirically that this causally-motivated regularization scheme yields robust predictors.
arXiv Detail & Related papers (2021-05-13T16:58:45Z)
Detached Error Feedback for Distributed SGD with Random Sparsification [98.98236187442258]
Communication bottleneck has been a critical problem in large-scale deep learning. We propose a new distributed error feedback (DEF) algorithm, which shows better convergence than error feedback for non-efficient distributed problems. We also propose DEFA to accelerate the generalization of DEF, which shows better bounds than DEF.
arXiv Detail & Related papers (2020-04-11T03:50:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.