Estimation of embedding vectors in high dimensions
- URL: http://arxiv.org/abs/2312.07802v1
- Date: Tue, 12 Dec 2023 23:41:59 GMT
- Title: Estimation of embedding vectors in high dimensions
- Authors: Golara Ahmadi Azar, Melika Emami, Alyson Fletcher, Sundeep Rangan
- Abstract summary: We consider a simple probability model for discrete data where there is some "true" but unknown embedding.
Under this model, it is shown that the embeddings can be learned by a variant of low-rank approximate message passing (AMP) method.
Our theoretical findings are validated by simulations on both synthetic data and real text data.
- Score: 10.55292041492388
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Embeddings are a basic initial feature extraction step in many machine
learning models, particularly in natural language processing. An embedding
attempts to map data tokens to a low-dimensional space where similar tokens are
mapped to vectors that are close to one another by some metric in the embedding
space. A basic question is how well can such embedding be learned? To study
this problem, we consider a simple probability model for discrete data where
there is some "true" but unknown embedding where the correlation of random
variables is related to the similarity of the embeddings. Under this model, it
is shown that the embeddings can be learned by a variant of low-rank
approximate message passing (AMP) method. The AMP approach enables precise
predictions of the accuracy of the estimation in certain high-dimensional
limits. In particular, the methodology provides insight on the relations of key
parameters such as the number of samples per value, the frequency of the terms,
and the strength of the embedding correlation on the probability distribution.
Our theoretical findings are validated by simulations on both synthetic data
and real text data.
Related papers
- Downstream-Pretext Domain Knowledge Traceback for Active Learning [138.02530777915362]
We propose a downstream-pretext domain knowledge traceback (DOKT) method that traces the data interactions of downstream knowledge and pre-training guidance.
DOKT consists of a traceback diversity indicator and a domain-based uncertainty estimator.
Experiments conducted on ten datasets show that our model outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2024-07-20T01:34:13Z) - An unfolding method based on conditional Invertible Neural Networks
(cINN) using iterative training [0.0]
Generative networks like invertible neural networks(INN) enable a probabilistic unfolding.
We introduce the iterative conditional INN(IcINN) for unfolding that adjusts for deviations between simulated training samples and data.
arXiv Detail & Related papers (2022-12-16T19:00:05Z) - VertiBayes: Learning Bayesian network parameters from vertically partitioned data with missing values [2.9707233220536313]
Federated learning makes it possible to train a machine learning model on decentralized data.
We propose a novel method called VertiBayes to train Bayesian networks on vertically partitioned data.
We experimentally show our approach produces models comparable to those learnt using traditional algorithms.
arXiv Detail & Related papers (2022-10-31T11:13:35Z) - Smoothed Embeddings for Certified Few-Shot Learning [63.68667303948808]
We extend randomized smoothing to few-shot learning models that map inputs to normalized embeddings.
Our results are confirmed by experiments on different datasets.
arXiv Detail & Related papers (2022-02-02T18:19:04Z) - Scalable Marginal Likelihood Estimation for Model Selection in Deep
Learning [78.83598532168256]
Marginal-likelihood based model-selection is rarely used in deep learning due to estimation difficulties.
Our work shows that marginal likelihoods can improve generalization and be useful when validation data is unavailable.
arXiv Detail & Related papers (2021-04-11T09:50:24Z) - Learning Optical Flow from a Few Matches [67.83633948984954]
We show that the dense correlation volume representation is redundant and accurate flow estimation can be achieved with only a fraction of elements in it.
Experiments show that our method can reduce computational cost and memory use significantly, while maintaining high accuracy.
arXiv Detail & Related papers (2021-04-05T21:44:00Z) - On a Variational Approximation based Empirical Likelihood ABC Method [1.5293427903448025]
We propose an easy-to-use empirical likelihood ABC method in this article.
We show that the target log-posterior can be approximated as a sum of an expected joint log-likelihood and the differential entropy of the data generating density.
arXiv Detail & Related papers (2020-11-12T21:24:26Z) - An Embedded Model Estimator for Non-Stationary Random Functions using
Multiple Secondary Variables [0.0]
This paper introduces the method and shows that it has consistency results that are similar in nature to those applying to geostatistical modelling and to Quantile Random Forests.
The algorithm works by estimating a conditional distribution for the target variable at each target location.
arXiv Detail & Related papers (2020-11-09T00:14:24Z) - Graph Embedding with Data Uncertainty [113.39838145450007]
spectral-based subspace learning is a common data preprocessing step in many machine learning pipelines.
Most subspace learning methods do not take into consideration possible measurement inaccuracies or artifacts that can lead to data with high uncertainty.
arXiv Detail & Related papers (2020-09-01T15:08:23Z) - Normal-bundle Bootstrap [2.741266294612776]
We present a method that generates new data which preserve the geometric structure of a given data set.
Inspired by algorithms for manifold learning and concepts in differential geometry, our method decomposes the underlying probability measure into a marginalized measure.
We apply our method to the inference of density ridge and related statistics, and data augmentation to reduce overfitting.
arXiv Detail & Related papers (2020-07-27T21:14:19Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.