Whitening Not Recommended for Classification Tasks in LLMs
- URL: http://arxiv.org/abs/2407.12886v1
- Date: Tue, 16 Jul 2024 22:48:30 GMT
- Title: Whitening Not Recommended for Classification Tasks in LLMs
- Authors: Ali Forooghi, Shaghayegh Sadeghi, Jianguo Lu,
- Abstract summary: Whitening has been claimed to be an effective operation to improve embedding quality obtained from Large Language Models (LLMs)
In particular, whitening degenerates embeddings for classification tasks.
A by-product of our research is embedding evaluation platform for LLMs called SentEval+.
- Score: 0.08192907805418582
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Sentence embedding is a cornerstone in NLP. Whitening has been claimed to be an effective operation to improve embedding quality obtained from Large Language Models (LLMs). However, we find that the efficacy of whitening is model-dependent and task-dependent. In particular, whitening degenerates embeddings for classification tasks. The conclusion is supported by extensive experiments. We also explored a variety of whitening operations, including PCA, ZCA, PCA-Cor, ZCA-Cor and Cholesky whitenings. A by-product of our research is embedding evaluation platform for LLMs called SentEval+.
Related papers
- Whitening Consistently Improves Self-Supervised Learning [5.0337106694127725]
We propose incorporating ZCA whitening as the final layer of the encoder in self-supervised learning.
Our experiments show that whitening is capable of improving linear and k-NN probing accuracy by 1-5%.
arXiv Detail & Related papers (2024-08-14T12:52:13Z) - PoLLMgraph: Unraveling Hallucinations in Large Language Models via State Transition Dynamics [51.17512229589]
PoLLMgraph is a model-based white-box detection and forecasting approach for large language models.
We show that hallucination can be effectively detected by analyzing the LLM's internal state transition dynamics.
Our work paves a new way for model-based white-box analysis of LLMs, motivating the research community to further explore, understand, and refine the intricate dynamics of LLM behaviors.
arXiv Detail & Related papers (2024-04-06T20:02:20Z) - Debiasing Multimodal Large Language Models [61.6896704217147]
Large Vision-Language Models (LVLMs) have become indispensable tools in computer vision and natural language processing.
Our investigation reveals a noteworthy bias in the generated content, where the output is primarily influenced by the underlying Large Language Models (LLMs) prior to the input image.
To rectify these biases and redirect the model's focus toward vision information, we introduce two simple, training-free strategies.
arXiv Detail & Related papers (2024-03-08T12:35:07Z) - Survival of the Most Influential Prompts: Efficient Black-Box Prompt
Search via Clustering and Pruning [77.61565726647784]
We propose a simple black-box search method that first clusters and prunes the search space to focus exclusively on influential prompt tokens.
Our findings underscore the critical role of search space design and optimization in enhancing both the usefulness and the efficiency of black-box prompt-based learning.
arXiv Detail & Related papers (2023-10-19T14:25:06Z) - Whitening-based Contrastive Learning of Sentence Embeddings [61.38955786965527]
This paper presents a whitening-based contrastive learning method for sentence embedding learning (WhitenedCSE)
We find that these two approaches are not totally redundant but actually have some complementarity due to different uniformity mechanism.
arXiv Detail & Related papers (2023-05-28T14:58:10Z) - Modulate Your Spectrum in Self-Supervised Learning [65.963806450552]
Whitening loss offers a theoretical guarantee against feature collapse in self-supervised learning.
We introduce Spectral Transformation (ST), a framework to modulate the spectrum of embedding.
We propose a novel ST instance named IterNorm with trace loss (INTL)
arXiv Detail & Related papers (2023-05-26T09:59:48Z) - An Investigation into Whitening Loss for Self-supervised Learning [62.157102463386394]
A desirable objective in self-supervised learning (SSL) is to avoid feature collapse.
We propose a framework with an informative indicator to analyze whitening loss.
Based on our analysis, we propose channel whitening with random group partition (CW-RGP)
arXiv Detail & Related papers (2022-10-07T14:43:29Z) - Improving Generalization of Batch Whitening by Convolutional Unit
Optimization [24.102442375834084]
Batch Whitening is a technique that accelerates and stabilizes training by transforming input features to have a zero mean (Centering) and a unit variance (Scaling)
In commonly used structures, which are empirically optimized with Batch Normalization, the normalization layer appears between convolution and activation function.
We propose a new Convolutional Unit that is in line with the theory, and our method generally improves the performance of Batch Whitening.
arXiv Detail & Related papers (2021-08-24T10:27:57Z) - White Paper Assistance: A Step Forward Beyond the Shortcut Learning [6.066543113636522]
We show that CNNs often overlook the need to examine whether they are doing the way we are actually interested.
We propose a novel approach called White Paper Assistance to combat this unintended propensity.
Our proposed method involves the white paper to detect the extent to which the model has preference for certain characterized patterns and alleviates it by forcing the model to make a random guess on the white paper.
arXiv Detail & Related papers (2021-06-08T08:35:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.