Anchor & Transform: Learning Sparse Embeddings for Large Vocabularies
- URL: http://arxiv.org/abs/2003.08197v4
- Date: Thu, 11 Mar 2021 06:11:05 GMT
- Title: Anchor & Transform: Learning Sparse Embeddings for Large Vocabularies
- Authors: Paul Pu Liang, Manzil Zaheer, Yuan Wang, Amr Ahmed
- Abstract summary: We design a simple and efficient embedding algorithm that learns a small set of anchor embeddings and a sparse transformation matrix.
On text classification, language modeling, and movie recommendation benchmarks, we show that ANT is particularly suitable for large vocabulary sizes.
- Score: 60.285091454321055
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning continuous representations of discrete objects such as text, users,
movies, and URLs lies at the heart of many applications including language and
user modeling. When using discrete objects as input to neural networks, we
often ignore the underlying structures (e.g., natural groupings and
similarities) and embed the objects independently into individual vectors. As a
result, existing methods do not scale to large vocabulary sizes. In this paper,
we design a simple and efficient embedding algorithm that learns a small set of
anchor embeddings and a sparse transformation matrix. We call our method Anchor
& Transform (ANT) as the embeddings of discrete objects are a sparse linear
combination of the anchors, weighted according to the transformation matrix.
ANT is scalable, flexible, and end-to-end trainable. We further provide a
statistical interpretation of our algorithm as a Bayesian nonparametric prior
for embeddings that encourages sparsity and leverages natural groupings among
objects. By deriving an approximate inference algorithm based on Small Variance
Asymptotics, we obtain a natural extension that automatically learns the
optimal number of anchors instead of having to tune it as a hyperparameter. On
text classification, language modeling, and movie recommendation benchmarks, we
show that ANT is particularly suitable for large vocabulary sizes and
demonstrates stronger performance with fewer parameters (up to 40x compression)
as compared to existing compression baselines.
Related papers
- Finetuning CLIP to Reason about Pairwise Differences [52.028073305958074]
We propose an approach to train vision-language models such as CLIP in a contrastive manner to reason about differences in embedding space.
We first demonstrate that our approach yields significantly improved capabilities in ranking images by a certain attribute.
We also illustrate that the resulting embeddings obey a larger degree of geometric properties in embedding space.
arXiv Detail & Related papers (2024-09-15T13:02:14Z) - Interpretability at Scale: Identifying Causal Mechanisms in Alpaca [62.65877150123775]
We use Boundless DAS to efficiently search for interpretable causal structure in large language models while they follow instructions.
Our findings mark a first step toward faithfully understanding the inner-workings of our ever-growing and most widely deployed language models.
arXiv Detail & Related papers (2023-05-15T17:15:40Z) - Learning Context-aware Classifier for Semantic Segmentation [88.88198210948426]
In this paper, contextual hints are exploited via learning a context-aware classifier.
Our method is model-agnostic and can be easily applied to generic segmentation models.
With only negligible additional parameters and +2% inference time, decent performance gain has been achieved on both small and large models.
arXiv Detail & Related papers (2023-03-21T07:00:35Z) - Efficient Transformers with Dynamic Token Pooling [11.28381882347617]
We equip language models with a dynamic-pooling mechanism, which predicts segment boundaries in an autoregressive fashion.
Results demonstrate that dynamic pooling, which jointly segments and models language, is both faster and more accurate than vanilla Transformers.
arXiv Detail & Related papers (2022-11-17T18:39:23Z) - Equivariance with Learned Canonicalization Functions [77.32483958400282]
We show that learning a small neural network to perform canonicalization is better than using predefineds.
Our experiments show that learning the canonicalization function is competitive with existing techniques for learning equivariant functions across many tasks.
arXiv Detail & Related papers (2022-11-11T21:58:15Z) - A Sparsity-promoting Dictionary Model for Variational Autoencoders [16.61511959679188]
Structuring the latent space in deep generative models is important to yield more expressive models and interpretable representations.
We propose a simple yet effective methodology to structure the latent space via a sparsity-promoting dictionary model.
arXiv Detail & Related papers (2022-03-29T17:13:11Z) - NodePiece: Compositional and Parameter-Efficient Representations of
Large Knowledge Graphs [15.289356276538662]
We propose NodePiece, an anchor-based approach to learn a fixed-size entity vocabulary.
In NodePiece, a vocabulary of subword/sub-entity units is constructed from anchor nodes in a graph with known relation types.
Experiments show that NodePiece performs competitively in node classification, link prediction, and relation prediction tasks.
arXiv Detail & Related papers (2021-06-23T03:51:03Z) - All Word Embeddings from One Embedding [23.643059189673473]
In neural network-based models for natural language processing, the largest part of the parameters often consists of word embeddings.
In this study, to reduce the total number of parameters, the embeddings for all words are represented by transforming a shared embedding.
The proposed method, ALONE, constructs the embedding of a word by modifying the shared embedding with a filter vector, which is word-specific but non-trainable.
arXiv Detail & Related papers (2020-04-25T07:38:08Z) - Improve Variational Autoencoder for Text Generationwith Discrete Latent
Bottleneck [52.08901549360262]
Variational autoencoders (VAEs) are essential tools in end-to-end representation learning.
VAEs tend to ignore latent variables with a strong auto-regressive decoder.
We propose a principled approach to enforce an implicit latent feature matching in a more compact latent space.
arXiv Detail & Related papers (2020-04-22T14:41:37Z) - An Advance on Variable Elimination with Applications to Tensor-Based
Computation [11.358487655918676]
We present new results on the classical algorithm of variable elimination, which underlies many algorithms including for probabilistic inference.
The results relate to exploiting functional dependencies, allowing one to perform inference and learning efficiently on models that have very large treewidth.
arXiv Detail & Related papers (2020-02-21T14:17:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.