A Closer Look at Model Adaptation using Feature Distortion and
Simplicity Bias
- URL: http://arxiv.org/abs/2303.13500v1
- Date: Thu, 23 Mar 2023 17:57:09 GMT
- Title: A Closer Look at Model Adaptation using Feature Distortion and
Simplicity Bias
- Authors: Puja Trivedi, Danai Koutra, Jayaraman J. Thiagarajan
- Abstract summary: We study the susceptibility of adaptation protocols to simplicity bias (SB)
SB has recently been shown to underlie several problems in robust generalization.
We propose modified linear probes that help mitigate SB.
- Score: 33.24980750651318
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Advances in the expressivity of pretrained models have increased interest in
the design of adaptation protocols which enable safe and effective transfer
learning. Going beyond conventional linear probing (LP) and fine tuning (FT)
strategies, protocols that can effectively control feature distortion, i.e.,
the failure to update features orthogonal to the in-distribution, have been
found to achieve improved out-of-distribution generalization (OOD). In order to
limit this distortion, the LP+FT protocol, which first learns a linear probe
and then uses this initialization for subsequent FT, was proposed. However, in
this paper, we find when adaptation protocols (LP, FT, LP+FT) are also
evaluated on a variety of safety objectives (e.g., calibration, robustness,
etc.), a complementary perspective to feature distortion is helpful to explain
protocol behavior. To this end, we study the susceptibility of protocols to
simplicity bias (SB), i.e. the well-known propensity of deep neural networks to
rely upon simple features, as SB has recently been shown to underlie several
problems in robust generalization. Using a synthetic dataset, we demonstrate
the susceptibility of existing protocols to SB. Given the strong effectiveness
of LP+FT, we then propose modified linear probes that help mitigate SB, and
lead to better initializations for subsequent FT. We verify the effectiveness
of the proposed LP+FT variants for decreasing SB in a controlled setting, and
their ability to improve OOD generalization and safety on three adaptation
datasets.
Related papers
- Sparse Orthogonal Parameters Tuning for Continual Learning [34.462967722928724]
Continual learning methods based on pre-trained models (PTM) have recently gained attention which adapt to successive downstream tasks without catastrophic forgetting.
We propose a novel yet effective method called SoTU (Sparse Orthogonal Parameters TUning)
arXiv Detail & Related papers (2024-11-05T05:19:09Z) - ASFT: Aligned Supervised Fine-Tuning through Absolute Likelihood [14.512464277772194]
Aligned Supervised Fine-Tuning (ASFT) is an effective approach that better aligns Large Language Models with pair-wise datasets.
ASFT mitigates the issue where the DPO loss function decreases the probability of generating human-dispreferred data.
Extensive experiments demonstrate that ASFT is an effective alignment approach, consistently outperforming existing methods.
arXiv Detail & Related papers (2024-09-14T11:39:13Z) - Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer [52.09480867526656]
We identify the source of misalignment as a form of distributional shift and uncertainty in learning human preferences.
To mitigate overoptimization, we first propose a theoretical algorithm that chooses the best policy for an adversarially chosen reward model.
Using the equivalence between reward models and the corresponding optimal policy, the algorithm features a simple objective that combines a preference optimization loss and a supervised learning loss.
arXiv Detail & Related papers (2024-05-26T05:38:50Z) - Sparse is Enough in Fine-tuning Pre-trained Large Language Models [98.46493578509039]
We propose a gradient-based sparse fine-tuning algorithm, named Sparse Increment Fine-Tuning (SIFT)
We validate its effectiveness on a range of tasks including the GLUE Benchmark and Instruction-tuning.
arXiv Detail & Related papers (2023-12-19T06:06:30Z) - Beyond Imitation: Leveraging Fine-grained Quality Signals for Alignment [105.34140537748546]
We propose an improved alignment approach named FIGA. Different from prior methods, we incorporate fine-grained quality signals that are derived by contrasting good and bad responses.
Our approach has made two major contributions. Firstly, we curate a refined alignment dataset that pairs initial responses and the corresponding revised ones.
Secondly, we devise a new loss function can leverage fine-grained quality signals to instruct the learning of LLMs for alignment.
arXiv Detail & Related papers (2023-11-07T15:36:40Z) - Approximated Prompt Tuning for Vision-Language Pre-trained Models [54.326232586461614]
In vision-language pre-trained models, prompt tuning often requires a large number of learnable tokens to bridge the gap between the pre-training and downstream tasks.
We propose a novel Approximated Prompt Tuning (APT) approach towards efficient VL transfer learning.
arXiv Detail & Related papers (2023-06-27T05:43:47Z) - Exploring the Design of Adaptation Protocols for Improved Generalization
and Machine Learning Safety [33.24980750651318]
We evaluate common adaptation protocols across distributions shifts and machine learning safety metrics.
We find that protocols induce disparate trade-offs that were not apparent from prior evaluation.
Using hardness-promoting augmentations during LP and then FT with augmentations may be particularly effective for trade-off mitigation.
arXiv Detail & Related papers (2022-07-26T02:33:04Z) - Detached Error Feedback for Distributed SGD with Random Sparsification [98.98236187442258]
Communication bottleneck has been a critical problem in large-scale deep learning.
We propose a new distributed error feedback (DEF) algorithm, which shows better convergence than error feedback for non-efficient distributed problems.
We also propose DEFA to accelerate the generalization of DEF, which shows better bounds than DEF.
arXiv Detail & Related papers (2020-04-11T03:50:59Z) - Stochastic-Sign SGD for Federated Learning with Theoretical Guarantees [49.91477656517431]
Quantization-based solvers have been widely adopted in Federated Learning (FL)
No existing methods enjoy all the aforementioned properties.
We propose an intuitively-simple yet theoretically-simple method based on SIGNSGD to bridge the gap.
arXiv Detail & Related papers (2020-02-25T15:12:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.