Measuring Intrinsic Dimension of Token Embeddings
- URL: http://arxiv.org/abs/2503.02142v1
- Date: Tue, 04 Mar 2025 00:19:01 GMT
- Title: Measuring Intrinsic Dimension of Token Embeddings
- Authors: Takuya Kataiwa, Cho Hakaze, Tetsushi Ohki,
- Abstract summary: We estimate the ID of token embeddings in small-scale language models and also modern large language models.<n>We observe an increase in redundancy rates as the model scale grows.<n>When LoRA is applied to the embedding layers, we observe a sudden drop in perplexity around the estimated IDs.
- Score: 0.13108652488669734
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this study, we measure the Intrinsic Dimension (ID) of token embedding to estimate the intrinsic dimensions of the manifolds spanned by the representations, so as to evaluate their redundancy quantitatively compared to their extrinsic dimensionality. In detail, (1) we estimate the ID of token embeddings in small-scale language models and also modern large language models, finding that the embedding spaces often reside on lower-dimensional manifolds compared to their extrinsic dimensionality; (2) we measure the ID across various model sizes and observe an increase in redundancy rates as the model scale grows; (3) we measure the dynamics of IDs during the training process, and find a rapid ID drop in the early stages of training. Moreover, (4) when LoRA is applied to the embedding layers, we observe a sudden drop in perplexity around the estimated IDs, suggesting that the ID can serve as a useful guideline for LoRA application.
Related papers
- Understanding Variational Autoencoders with Intrinsic Dimension and Information Imbalance [2.7446241148152257]
This work presents an analysis of the hidden representations of Variational Autoencoders (VAEs) using the Intrinsic Dimension (ID) and the Information Imbalance (II)
We show that VAEs undergo a transition in behaviour once the bottleneck size is larger than the ID of the data, manifesting in a double hunchback ID profile and a qualitative shift in information processing as captured by the II.
arXiv Detail & Related papers (2024-11-04T10:58:41Z) - A Geometric Framework for Understanding Memorization in Generative Models [11.263296715798374]
Recent work has shown that deep generative models can be capable of memorizing and reproducing training datapoints when deployed.
These findings call into question the usability of generative models, especially in light of the legal and privacy risks brought about by memorization.
We propose the manifold memorization hypothesis (MMH), a geometric framework which leverages the manifold hypothesis into a clear language in which to reason about memorization.
arXiv Detail & Related papers (2024-10-31T18:09:01Z) - 4D Contrastive Superflows are Dense 3D Representation Learners [62.433137130087445]
We introduce SuperFlow, a novel framework designed to harness consecutive LiDAR-camera pairs for establishing pretraining objectives.
To further boost learning efficiency, we incorporate a plug-and-play view consistency module that enhances alignment of the knowledge distilled from camera views.
arXiv Detail & Related papers (2024-07-08T17:59:54Z) - Beyond the noise: intrinsic dimension estimation with optimal neighbourhood identification [43.26660964074272]
The Intrinsic Dimension (ID) is a key concept in unsupervised learning and feature selection.
We introduce an automatic protocol to select the sweet spot, namely the correct range of scales in which the ID is meaningful and useful.
We derive theoretical guarantees and illustrate the usefulness and robustness of this procedure by benchmarks on artificial and real-world datasets.
arXiv Detail & Related papers (2024-05-24T01:08:05Z) - Dual-Perspective Knowledge Enrichment for Semi-Supervised 3D Object
Detection [55.210991151015534]
We present a novel Dual-Perspective Knowledge Enrichment approach named DPKE for semi-supervised 3D object detection.
Our DPKE enriches the knowledge of limited training data, particularly unlabeled data, from two perspectives: data-perspective and feature-perspective.
arXiv Detail & Related papers (2024-01-10T08:56:07Z) - Interpretable 2D Vision Models for 3D Medical Images [47.75089895500738]
This study proposes a simple approach of adapting 2D networks with an intermediate feature representation for processing 3D images.
We show on all 3D MedMNIST datasets as benchmark and two real-world datasets consisting of several hundred high-resolution CT or MRI scans that our approach performs on par with existing methods.
arXiv Detail & Related papers (2023-07-13T08:27:09Z) - DCID: Deep Canonical Information Decomposition [84.59396326810085]
We consider the problem of identifying the signal shared between two one-dimensional target variables.
We propose ICM, an evaluation metric which can be used in the presence of ground-truth labels.
We also propose Deep Canonical Information Decomposition (DCID) - a simple, yet effective approach for learning the shared variables.
arXiv Detail & Related papers (2023-06-27T16:59:06Z) - Robust Saliency-Aware Distillation for Few-shot Fine-grained Visual
Recognition [57.08108545219043]
Recognizing novel sub-categories with scarce samples is an essential and challenging research topic in computer vision.
Existing literature addresses this challenge by employing local-based representation approaches.
This article proposes a novel model, Robust Saliency-aware Distillation (RSaD), for few-shot fine-grained visual recognition.
arXiv Detail & Related papers (2023-05-12T00:13:17Z) - Intrinsic Dimensionality Estimation within Tight Localities: A
Theoretical and Experimental Analysis [0.0]
We propose a local ID estimation strategy stable even for tight' localities consisting of as few as 20 sample points.
Our experimental results show that our proposed estimation technique can achieve notably smaller variance, while maintaining comparable levels of bias, at much smaller sample sizes than state-of-the-art estimators.
arXiv Detail & Related papers (2022-09-29T00:00:11Z) - Intrinsic dimension estimation for discrete metrics [65.5438227932088]
In this letter we introduce an algorithm to infer the intrinsic dimension (ID) of datasets embedded in discrete spaces.
We demonstrate its accuracy on benchmark datasets, and we apply it to analyze a metagenomic dataset for species fingerprinting.
This suggests that evolutive pressure acts on a low-dimensional manifold despite the high-dimensionality of sequences' space.
arXiv Detail & Related papers (2022-07-20T06:38:36Z) - Self-Attention Neural Bag-of-Features [103.70855797025689]
We build on the recently introduced 2D-Attention and reformulate the attention learning methodology.
We propose a joint feature-temporal attention mechanism that learns a joint 2D attention mask highlighting relevant information.
arXiv Detail & Related papers (2022-01-26T17:54:14Z) - Local intrinsic dimensionality estimators based on concentration of
measure [0.0]
Intrinsic dimensionality (ID) is one of the most fundamental characteristics of multi-dimensional data point clouds.
We introduce new local estimators of ID based on linear separability of multi-dimensional data point clouds.
arXiv Detail & Related papers (2020-01-31T09:49:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.