Isotropy Matters: Soft-ZCA Whitening of Embeddings for Semantic Code Search
- URL: http://arxiv.org/abs/2411.17538v2
- Date: Wed, 27 Nov 2024 09:43:01 GMT
- Title: Isotropy Matters: Soft-ZCA Whitening of Embeddings for Semantic Code Search
- Authors: Andor Diera, Lukas Galke, Ansgar Scherp,
- Abstract summary: Low isotropy in an embedding space impairs performance on tasks involving semantic inference.<n>We propose a modified ZCA whitening technique to control isotropy levels in embeddings.
- Score: 6.704529554100875
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Low isotropy in an embedding space impairs performance on tasks involving semantic inference. Our study investigates the impact of isotropy on semantic code search performance and explores post-processing techniques to mitigate this issue. We analyze various code language models, examine isotropy in their embedding spaces, and its influence on search effectiveness. We propose a modified ZCA whitening technique to control isotropy levels in embeddings. Our results demonstrate that Soft-ZCA whitening improves the performance of pre-trained code language models and can complement contrastive fine-tuning.
Related papers
- ACE : Off-Policy Actor-Critic with Causality-Aware Entropy Regularization [52.5587113539404]
We introduce a causality-aware entropy term that effectively identifies and prioritizes actions with high potential impacts for efficient exploration.
Our proposed algorithm, ACE: Off-policy Actor-critic with Causality-aware Entropy regularization, demonstrates a substantial performance advantage across 29 diverse continuous control tasks.
arXiv Detail & Related papers (2024-02-22T13:22:06Z) - An Analysis of Embedding Layers and Similarity Scores using Siamese
Neural Networks [0.0]
This study examines the embedding algorithms from leading companies in the industry, such as OpenAI, Google's PaLM, and BERT.
Using medical data, we have analyzed similarity scores of each embedding layer, observing differences in performance among each algorithm.
To enhance each model and provide an additional encoding layer, we also implemented Siamese Neural Networks.
arXiv Detail & Related papers (2023-12-31T20:21:58Z) - Modulate Your Spectrum in Self-Supervised Learning [65.963806450552]
Whitening loss offers a theoretical guarantee against feature collapse in self-supervised learning.
We introduce Spectral Transformation (ST), a framework to modulate the spectrum of embedding.
We propose a novel ST instance named IterNorm with trace loss (INTL)
arXiv Detail & Related papers (2023-05-26T09:59:48Z) - Unsupervised Synthetic Image Refinement via Contrastive Learning and
Consistent Semantic-Structural Constraints [32.07631215590755]
Contrastive learning (CL) has been successfully used to pull correlated patches together and push uncorrelated ones apart.
In this work, we exploit semantic and structural consistency between synthetic and refined images and adopt CL to reduce the semantic distortion.
arXiv Detail & Related papers (2023-04-25T05:55:28Z) - Dual Stage Stylization Modulation for Domain Generalized Semantic
Segmentation [39.35385886870209]
We introduce a dual-stage Feature Transform (dFT) layer within the Adversarial Semantic Hallucination+ framework.
By leveraging semantic information for each pixel, our approach adaptively adjusts the pixel-wise hallucination strength.
We validate the effectiveness of our proposed method through comprehensive experiments on publicly available semantic segmentation benchmark datasets.
arXiv Detail & Related papers (2023-04-18T23:54:20Z) - Towards Efficient Fine-tuning of Pre-trained Code Models: An
Experimental Study and Beyond [52.656743602538825]
Fine-tuning pre-trained code models incurs a large computational cost.
We conduct an experimental study to explore what happens to layer-wise pre-trained representations and their encoded code knowledge during fine-tuning.
We propose Telly to efficiently fine-tune pre-trained code models via layer freezing.
arXiv Detail & Related papers (2023-04-11T13:34:13Z) - Orthogonal SVD Covariance Conditioning and Latent Disentanglement [65.67315418971688]
Inserting an SVD meta-layer into neural networks is prone to make the covariance ill-conditioned.
We propose Nearest Orthogonal Gradient (NOG) and Optimal Learning Rate (OLR)
Experiments on visual recognition demonstrate that our methods can simultaneously improve covariance conditioning and generalization.
arXiv Detail & Related papers (2022-12-11T20:31:31Z) - Improved Beam Search for Hallucination Mitigation in Abstractive
Summarization [1.2328446298523066]
In this paper, we investigate the use of the Natural Language Inference (NLI) entailment metric to detect and prevent hallucinations in summary generation.
We propose an NLI-assisted beam re-ranking mechanism by computing entailment probability scores between the input context and summarization model-generated beams.
Our proposed algorithm significantly outperforms vanilla beam decoding on XSum and CNN/DM datasets.
arXiv Detail & Related papers (2022-12-06T02:33:47Z) - Discovering Class-Specific GAN Controls for Semantic Image Synthesis [73.91655061467988]
We propose a novel method for finding spatially disentangled class-specific directions in the latent space of pretrained SIS models.
We show that the latent directions found by our method can effectively control the local appearance of semantic classes.
arXiv Detail & Related papers (2022-12-02T21:39:26Z) - Understanding and Mitigating Overfitting in Prompt Tuning for
Vision-Language Models [108.13378788663196]
We propose Subspace Prompt Tuning (SubPT) to project the gradients in back-propagation onto the low-rank subspace spanned by the early-stage gradient flow eigenvectors during the entire training process.
We equip CoOp with Novel Learner Feature (NFL) to enhance the generalization ability of the learned prompts onto novel categories beyond the training set.
arXiv Detail & Related papers (2022-11-04T02:06:22Z) - Adaptive Meta-learner via Gradient Similarity for Few-shot Text
Classification [11.035878821365149]
We propose a novel Adaptive Meta-learner via Gradient Similarity (AMGS) to improve the model generalization ability to a new task.
Experimental results on several benchmarks demonstrate that the proposed AMGS consistently improves few-shot text classification performance.
arXiv Detail & Related papers (2022-09-10T16:14:53Z) - Deep Semantic Statistics Matching (D2SM) Denoising Network [70.01091467628068]
We introduce the Deep Semantic Statistics Matching (D2SM) Denoising Network.
It exploits semantic features of pretrained classification networks, then it implicitly matches the probabilistic distribution of clear images at the semantic feature space.
By learning to preserve the semantic distribution of denoised images, we empirically find our method significantly improves the denoising capabilities of networks.
arXiv Detail & Related papers (2022-07-19T14:35:42Z) - Fine-tuning Pre-trained Language Models for Few-shot Intent Detection: Supervised Pre-training and Isotropization [21.859795973653657]
We propose to improve supervised pre-training by regularizing the feature space towards isotropy.
Our main finding is that it is promising to regularize supervised pre-training with isotropization to further improve the performance of few-shot intent detection.
arXiv Detail & Related papers (2022-05-15T07:48:13Z) - Improving Music Performance Assessment with Contrastive Learning [78.8942067357231]
This study investigates contrastive learning as a potential method to improve existing MPA systems.
We introduce a weighted contrastive loss suitable for regression tasks applied to a convolutional neural network.
Our results show that contrastive-based methods are able to match and exceed SoTA performance for MPA regression tasks.
arXiv Detail & Related papers (2021-08-03T19:24:25Z) - Tesseract: Tensorised Actors for Multi-Agent Reinforcement Learning [92.05556163518999]
MARL exacerbates matters by imposing various constraints on communication and observability.
For value-based methods, it poses challenges in accurately representing the optimal value function.
For policy gradient methods, it makes training the critic difficult and exacerbates the problem of the lagging critic.
We show that from a learning theory perspective, both problems can be addressed by accurately representing the associated action-value function.
arXiv Detail & Related papers (2021-05-31T23:08:05Z) - Effects of Pre- and Post-Processing on type-based Embeddings in Lexical
Semantic Change Detection [4.7677261488999205]
We optimize existing models by (i) pre-training on large corpora and refining on diachronic target corpora tackling the notorious small data problem.
Our results provide a guide for the application and optimization of lexical semantic change detection models across various learning scenarios.
arXiv Detail & Related papers (2021-01-22T22:34:15Z) - Rectified Meta-Learning from Noisy Labels for Robust Image-based Plant
Disease Diagnosis [64.82680813427054]
Plant diseases serve as one of main threats to food security and crop production.
One popular approach is to transform this problem as a leaf image classification task, which can be addressed by the powerful convolutional neural networks (CNNs)
We propose a novel framework that incorporates rectified meta-learning module into common CNN paradigm to train a noise-robust deep network without using extra supervision information.
arXiv Detail & Related papers (2020-03-17T09:51:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.