How Does Fine-tuning Affect the Geometry of Embedding Space: A Case
Study on Isotropy
- URL: http://arxiv.org/abs/2109.04740v1
- Date: Fri, 10 Sep 2021 08:58:59 GMT
- Title: How Does Fine-tuning Affect the Geometry of Embedding Space: A Case
Study on Isotropy
- Authors: Sara Rajaee and Mohammad Taher Pilehvar
- Abstract summary: We analyze the extent to which the isotropy of the embedding space changes after fine-tuning.
Local structures in pre-trained contextual word representations (CWRs) undergo a massive change during fine-tuning.
- Score: 18.490856440975996
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: It is widely accepted that fine-tuning pre-trained language models usually
brings about performance improvements in downstream tasks. However, there are
limited studies on the reasons behind this effectiveness, particularly from the
viewpoint of structural changes in the embedding space. Trying to fill this
gap, in this paper, we analyze the extent to which the isotropy of the
embedding space changes after fine-tuning. We demonstrate that, even though
isotropy is a desirable geometrical property, fine-tuning does not necessarily
result in isotropy enhancements. Moreover, local structures in pre-trained
contextual word representations (CWRs), such as those encoding token types or
frequency, undergo a massive change during fine-tuning. Our experiments show
dramatic growth in the number of elongated directions in the embedding space,
which, in contrast to pre-trained CWRs, carry the essential linguistic
knowledge in the fine-tuned embedding space, making existing isotropy
enhancement methods ineffective.
Related papers
- Shrink the longest: improving latent space isotropy with symplicial geometry [0.0]
We propose a novel regularization technique based on simplicial geometry to improve the isotropy of latent representations.
We demonstrate that the method leads to an increase in downstream performance while significantly lowering the anisotropy during fine-tuning.
arXiv Detail & Related papers (2025-01-09T18:44:10Z) - ND-SDF: Learning Normal Deflection Fields for High-Fidelity Indoor Reconstruction [50.07671826433922]
It is non-trivial to simultaneously recover meticulous geometry and preserve smoothness across regions with differing characteristics.
We propose ND-SDF, which learns a Normal Deflection field to represent the angular deviation between the scene normal and the prior normal.
Our method not only obtains smooth weakly textured regions such as walls and floors but also preserves the geometric details of complex structures.
arXiv Detail & Related papers (2024-08-22T17:59:01Z) - Exploring the Impact of a Transformer's Latent Space Geometry on Downstream Task Performance [0.0]
We propose that much of the benefit from pre-training may be captured by geometric characteristics of the latent space representations.
We find that there is a strong linear relationship between a measure of quantized cell density and average GLUE performance.
arXiv Detail & Related papers (2024-06-18T00:17:30Z) - Maintaining Structural Integrity in Parameter Spaces for Parameter Efficient Fine-tuning [78.39310274926535]
Adapting pre-trained foundation models for various downstream tasks has been prevalent in artificial intelligence.
To mitigate this, several fine-tuning techniques have been developed to update the pre-trained model weights in a more resource-efficient manner.
This paper introduces a generalized parameter-efficient fine-tuning framework, designed for various dimensional parameter space.
arXiv Detail & Related papers (2024-05-23T16:04:42Z) - Skews in the Phenomenon Space Hinder Generalization in Text-to-Image Generation [59.138470433237615]
We introduce statistical metrics that quantify both the linguistic and visual skew of a dataset for relational learning.
We show that systematically controlled metrics are strongly predictive of generalization performance.
This work informs an important direction towards quality-enhancing the data diversity or balance to scaling up the absolute size.
arXiv Detail & Related papers (2024-03-25T03:18:39Z) - GeoGaussian: Geometry-aware Gaussian Splatting for Scene Rendering [83.19049705653072]
During the Gaussian Splatting optimization process, the scene's geometry can gradually deteriorate if its structure is not deliberately preserved.
We propose a novel approach called GeoGaussian to mitigate this issue.
Our proposed pipeline achieves state-of-the-art performance in novel view synthesis and geometric reconstruction.
arXiv Detail & Related papers (2024-03-17T20:06:41Z) - Alignment and Outer Shell Isotropy for Hyperbolic Graph Contrastive
Learning [69.6810940330906]
We propose a novel contrastive learning framework to learn high-quality graph embedding.
Specifically, we design the alignment metric that effectively captures the hierarchical data-invariant information.
We show that in the hyperbolic space one has to address the leaf- and height-level uniformity which are related to properties of trees.
arXiv Detail & Related papers (2023-10-27T15:31:42Z) - Analyzing the Latent Space of GAN through Local Dimension Estimation [4.688163910878411]
style-based GANs (StyleGANs) in high-fidelity image synthesis have motivated research to understand the semantic properties of their latent spaces.
We propose a local dimension estimation algorithm for arbitrary intermediate layers in a pre-trained GAN model.
Our proposed metric, called Distortion, measures an inconsistency of intrinsic space on the learned latent space.
arXiv Detail & Related papers (2022-05-26T06:36:06Z) - A Cluster-based Approach for Improving Isotropy in Contextual Embedding
Space [18.490856440975996]
The representation degeneration problem in Contextual Word Representations (CWRs) hurts the expressiveness of the embedding space.
We propose a local cluster-based method to address the degeneration issue in contextual embedding spaces.
We show that removing dominant directions of verb representations can transform the space to better suit semantic applications.
arXiv Detail & Related papers (2021-06-02T14:26:37Z) - APo-VAE: Text Generation in Hyperbolic Space [116.11974607497986]
In this paper, we investigate text generation in a hyperbolic latent space to learn continuous hierarchical representations.
An Adrial Poincare Variversaational Autoencoder (APo-VAE) is presented, where both the prior and variational posterior of latent variables are defined over a Poincare ball via wrapped normal distributions.
Experiments in language modeling and dialog-response generation tasks demonstrate the winning effectiveness of the proposed APo-VAE model.
arXiv Detail & Related papers (2020-04-30T19:05:41Z) - What Happens To BERT Embeddings During Fine-tuning? [19.016185902256826]
We investigate how fine-tuning affects the representations of the BERT model.
We find that fine-tuning primarily affects the top layers of BERT, but with noteworthy variation across tasks.
In particular, dependency parsing reconfigures most of the model, whereas SQuAD and MNLI appear to involve much shallower processing.
arXiv Detail & Related papers (2020-04-29T19:46:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.