Related papers: Anchor & Transform: Learning Sparse Embeddings for Large Vocabularies

Anchor & Transform: Learning Sparse Embeddings for Large Vocabularies

URL: http://arxiv.org/abs/2003.08197v4
Date: Thu, 11 Mar 2021 06:11:05 GMT
Title: Anchor & Transform: Learning Sparse Embeddings for Large Vocabularies
Authors: Paul Pu Liang, Manzil Zaheer, Yuan Wang, Amr Ahmed
Abstract summary: We design a simple and efficient embedding algorithm that learns a small set of anchor embeddings and a sparse transformation matrix. On text classification, language modeling, and movie recommendation benchmarks, we show that ANT is particularly suitable for large vocabulary sizes.
Score: 60.285091454321055
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Learning continuous representations of discrete objects such as text, users, movies, and URLs lies at the heart of many applications including language and user modeling. When using discrete objects as input to neural networks, we often ignore the underlying structures (e.g., natural groupings and similarities) and embed the objects independently into individual vectors. As a result, existing methods do not scale to large vocabulary sizes. In this paper, we design a simple and efficient embedding algorithm that learns a small set of anchor embeddings and a sparse transformation matrix. We call our method Anchor & Transform (ANT) as the embeddings of discrete objects are a sparse linear combination of the anchors, weighted according to the transformation matrix. ANT is scalable, flexible, and end-to-end trainable. We further provide a statistical interpretation of our algorithm as a Bayesian nonparametric prior for embeddings that encourages sparsity and leverages natural groupings among objects. By deriving an approximate inference algorithm based on Small Variance Asymptotics, we obtain a natural extension that automatically learns the optimal number of anchors instead of having to tune it as a hyperparameter. On text classification, language modeling, and movie recommendation benchmarks, we show that ANT is particularly suitable for large vocabulary sizes and demonstrates stronger performance with fewer parameters (up to 40x compression) as compared to existing compression baselines.

Related papers

Kolmogorov GAM Networks are all you need! [0.6906005491572398]
Kolmogorov GAM networks are shown to be an efficient architecture for training and inference. They are an additive model with an embedding that is independent of the function of interest.
arXiv Detail & Related papers (2025-01-01T02:46:00Z)
Finetuning CLIP to Reason about Pairwise Differences [52.028073305958074]
We propose an approach to train vision-language models such as CLIP in a contrastive manner to reason about differences in embedding space. We first demonstrate that our approach yields significantly improved capabilities in ranking images by a certain attribute. We also illustrate that the resulting embeddings obey a larger degree of geometric properties in embedding space.
arXiv Detail & Related papers (2024-09-15T13:02:14Z)
Interpretability at Scale: Identifying Causal Mechanisms in Alpaca [62.65877150123775]
We use Boundless DAS to efficiently search for interpretable causal structure in large language models while they follow instructions. Our findings mark a first step toward faithfully understanding the inner-workings of our ever-growing and most widely deployed language models.
arXiv Detail & Related papers (2023-05-15T17:15:40Z)
Learning Context-aware Classifier for Semantic Segmentation [88.88198210948426]
In this paper, contextual hints are exploited via learning a context-aware classifier. Our method is model-agnostic and can be easily applied to generic segmentation models. With only negligible additional parameters and +2% inference time, decent performance gain has been achieved on both small and large models.
arXiv Detail & Related papers (2023-03-21T07:00:35Z)
Efficient Transformers with Dynamic Token Pooling [11.28381882347617]
We equip language models with a dynamic-pooling mechanism, which predicts segment boundaries in an autoregressive fashion. Results demonstrate that dynamic pooling, which jointly segments and models language, is both faster and more accurate than vanilla Transformers.
arXiv Detail & Related papers (2022-11-17T18:39:23Z)
Equivariance with Learned Canonicalization Functions [77.32483958400282]
We show that learning a small neural network to perform canonicalization is better than using predefineds. Our experiments show that learning the canonicalization function is competitive with existing techniques for learning equivariant functions across many tasks.
arXiv Detail & Related papers (2022-11-11T21:58:15Z)
A Sparsity-promoting Dictionary Model for Variational Autoencoders [16.61511959679188]
Structuring the latent space in deep generative models is important to yield more expressive models and interpretable representations. We propose a simple yet effective methodology to structure the latent space via a sparsity-promoting dictionary model.
arXiv Detail & Related papers (2022-03-29T17:13:11Z)
NodePiece: Compositional and Parameter-Efficient Representations of Large Knowledge Graphs [15.289356276538662]
We propose NodePiece, an anchor-based approach to learn a fixed-size entity vocabulary. In NodePiece, a vocabulary of subword/sub-entity units is constructed from anchor nodes in a graph with known relation types. Experiments show that NodePiece performs competitively in node classification, link prediction, and relation prediction tasks.
arXiv Detail & Related papers (2021-06-23T03:51:03Z)
All Word Embeddings from One Embedding [23.643059189673473]
In neural network-based models for natural language processing, the largest part of the parameters often consists of word embeddings. In this study, to reduce the total number of parameters, the embeddings for all words are represented by transforming a shared embedding. The proposed method, ALONE, constructs the embedding of a word by modifying the shared embedding with a filter vector, which is word-specific but non-trainable.
arXiv Detail & Related papers (2020-04-25T07:38:08Z)
Improve Variational Autoencoder for Text Generationwith Discrete Latent Bottleneck [52.08901549360262]
Variational autoencoders (VAEs) are essential tools in end-to-end representation learning. VAEs tend to ignore latent variables with a strong auto-regressive decoder. We propose a principled approach to enforce an implicit latent feature matching in a more compact latent space.
arXiv Detail & Related papers (2020-04-22T14:41:37Z)
An Advance on Variable Elimination with Applications to Tensor-Based Computation [11.358487655918676]
We present new results on the classical algorithm of variable elimination, which underlies many algorithms including for probabilistic inference. The results relate to exploiting functional dependencies, allowing one to perform inference and learning efficiently on models that have very large treewidth.
arXiv Detail & Related papers (2020-02-21T14:17:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.