Understanding the Downstream Instability of Word Embeddings
- URL: http://arxiv.org/abs/2003.04983v1
- Date: Sat, 29 Feb 2020 00:39:12 GMT
- Title: Understanding the Downstream Instability of Word Embeddings
- Authors: Megan Leszczynski, Avner May, Jian Zhang, Sen Wu, Christopher R.
Aberger, Christopher R\'e
- Abstract summary: Many industrial machine learning (ML) systems require frequent retraining to keep up-to-date with constantly changing data.
Small changes in training data can cause significant changes in the model's predictions.
We show how a core building block of modern natural language processing pipelines---pre-trained word embeddings---affects the instability of downstream NLP models.
- Score: 14.373952177486558
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many industrial machine learning (ML) systems require frequent retraining to
keep up-to-date with constantly changing data. This retraining exacerbates a
large challenge facing ML systems today: model training is unstable, i.e.,
small changes in training data can cause significant changes in the model's
predictions. In this paper, we work on developing a deeper understanding of
this instability, with a focus on how a core building block of modern natural
language processing (NLP) pipelines---pre-trained word embeddings---affects the
instability of downstream NLP models. We first empirically reveal a tradeoff
between stability and memory: increasing the embedding memory 2x can reduce the
disagreement in predictions due to small changes in training data by 5% to 37%
(relative). To theoretically explain this tradeoff, we introduce a new measure
of embedding instability---the eigenspace instability measure---which we prove
bounds the disagreement in downstream predictions introduced by the change in
word embeddings. Practically, we show that the eigenspace instability measure
can be a cost-effective way to choose embedding parameters to minimize
instability without training downstream models, outperforming other embedding
distance measures and performing competitively with a nearest neighbor-based
measure. Finally, we demonstrate that the observed stability-memory tradeoffs
extend to other types of embeddings as well, including knowledge graph and
contextual word embeddings.
Related papers
- Sculpting Subspaces: Constrained Full Fine-Tuning in LLMs for Continual Learning [19.27175827358111]
Continual learning in large language models (LLMs) is prone to catastrophic forgetting, where adapting to new tasks significantly degrades performance on previously learned ones.
We propose a novel continual full fine-tuning approach leveraging adaptive singular value decomposition (SVD)
We evaluate our approach extensively on standard continual learning benchmarks using both encoder-decoder (T5-Large) and decoder-only (LLaMA-2 7B) models.
arXiv Detail & Related papers (2025-04-09T17:59:42Z) - What Do Learning Dynamics Reveal About Generalization in LLM Reasoning? [83.83230167222852]
We find that a model's generalization behavior can be effectively characterized by a training metric we call pre-memorization train accuracy.
By connecting a model's learning behavior to its generalization, pre-memorization train accuracy can guide targeted improvements to training strategies.
arXiv Detail & Related papers (2024-11-12T09:52:40Z) - Large Continual Instruction Assistant [59.585544987096974]
Continual Instruction Tuning (CIT) is adopted to instruct Large Models to follow human intent data by data.
Existing update gradient would heavily destroy the performance on previous datasets during CIT process.
We propose a general continual instruction tuning framework to address the challenge.
arXiv Detail & Related papers (2024-10-08T11:24:59Z) - Federated Continual Learning Goes Online: Uncertainty-Aware Memory Management for Vision Tasks and Beyond [13.867793835583463]
We propose an uncertainty-aware memory-based approach to solve catastrophic forgetting.
We retrieve samples with specific characteristics, and - by retraining the model on such samples - we demonstrate the potential of this approach.
arXiv Detail & Related papers (2024-05-29T09:29:39Z) - Low-rank finetuning for LLMs: A fairness perspective [54.13240282850982]
Low-rank approximation techniques have become the de facto standard for fine-tuning Large Language Models.
This paper investigates the effectiveness of these methods in capturing the shift of fine-tuning datasets from the initial pre-trained data distribution.
We show that low-rank fine-tuning inadvertently preserves undesirable biases and toxic behaviors.
arXiv Detail & Related papers (2024-05-28T20:43:53Z) - Robust Machine Learning by Transforming and Augmenting Imperfect
Training Data [6.928276018602774]
This thesis explores several data sensitivities of modern machine learning.
We first discuss how to prevent ML from codifying prior human discrimination measured in the training data.
We then discuss the problem of learning from data containing spurious features, which provide predictive fidelity during training but are unreliable upon deployment.
arXiv Detail & Related papers (2023-12-19T20:49:28Z) - Enhancing Multiple Reliability Measures via Nuisance-extended
Information Bottleneck [77.37409441129995]
In practical scenarios where training data is limited, many predictive signals in the data can be rather from some biases in data acquisition.
We consider an adversarial threat model under a mutual information constraint to cover a wider class of perturbations in training.
We propose an autoencoder-based training to implement the objective, as well as practical encoder designs to facilitate the proposed hybrid discriminative-generative training.
arXiv Detail & Related papers (2023-03-24T16:03:21Z) - Understanding and Improving Transfer Learning of Deep Models via Neural Collapse [37.483109067209504]
This work investigates the relationship between neural collapse (NC) and transfer learning for classification problems.
We find strong correlation between feature collapse and downstream performance.
Our proposed fine-tuning methods deliver good performances while reducing fine-tuning parameters by at least 90%.
arXiv Detail & Related papers (2022-12-23T08:48:34Z) - Do Gradient Inversion Attacks Make Federated Learning Unsafe? [70.0231254112197]
Federated learning (FL) allows the collaborative training of AI models without needing to share raw data.
Recent works on the inversion of deep neural networks from model gradients raised concerns about the security of FL in preventing the leakage of training data.
In this work, we show that these attacks presented in the literature are impractical in real FL use-cases and provide a new baseline attack.
arXiv Detail & Related papers (2022-02-14T18:33:12Z) - Regularizing Variational Autoencoder with Diversity and Uncertainty
Awareness [61.827054365139645]
Variational Autoencoder (VAE) approximates the posterior of latent variables based on amortized variational inference.
We propose an alternative model, DU-VAE, for learning a more Diverse and less Uncertain latent space.
arXiv Detail & Related papers (2021-10-24T07:58:13Z) - Accurate and Robust Feature Importance Estimation under Distribution
Shifts [49.58991359544005]
PRoFILE is a novel feature importance estimation method.
We show significant improvements over state-of-the-art approaches, both in terms of fidelity and robustness.
arXiv Detail & Related papers (2020-09-30T05:29:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.