Unsupervised Sentence-embeddings by Manifold Approximation and
Projection
- URL: http://arxiv.org/abs/2102.03795v1
- Date: Sun, 7 Feb 2021 13:27:58 GMT
- Title: Unsupervised Sentence-embeddings by Manifold Approximation and
Projection
- Authors: Subhradeep Kayal
- Abstract summary: We propose a novel technique to generate sentence-embeddings in an unsupervised fashion by projecting the sentences onto a fixed-dimensional manifold.
We test our approach, which we term EMAP or Embeddings by Manifold Approximation and Projection, on six publicly available text-classification datasets of varying size and complexity.
- Score: 3.04585143845864
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The concept of unsupervised universal sentence encoders has gained traction
recently, wherein pre-trained models generate effective task-agnostic
fixed-dimensional representations for phrases, sentences and paragraphs. Such
methods are of varying complexity, from simple weighted-averages of word
vectors to complex language-models based on bidirectional transformers. In this
work we propose a novel technique to generate sentence-embeddings in an
unsupervised fashion by projecting the sentences onto a fixed-dimensional
manifold with the objective of preserving local neighbourhoods in the original
space. To delineate such neighbourhoods we experiment with several set-distance
metrics, including the recently proposed Word Mover's distance, while the
fixed-dimensional projection is achieved by employing a scalable and efficient
manifold approximation method rooted in topological data analysis. We test our
approach, which we term EMAP or Embeddings by Manifold Approximation and
Projection, on six publicly available text-classification datasets of varying
size and complexity. Empirical results show that our method consistently
performs similar to or better than several alternative state-of-the-art
approaches.
Related papers
- Language Model Decoding as Direct Metrics Optimization [87.68281625776282]
Current decoding methods struggle to generate texts that align with human texts across different aspects.
In this work, we frame decoding from a language model as an optimization problem with the goal of strictly matching the expected performance with human texts.
We prove that this induced distribution is guaranteed to improve the perplexity on human texts, which suggests a better approximation to the underlying distribution of human texts.
arXiv Detail & Related papers (2023-10-02T09:35:27Z) - Explaining text classifiers through progressive neighborhood
approximation with realistic samples [19.26084350822197]
The importance of neighborhood construction in local explanation methods has been highlighted in the literature.
Several attempts have been made to improve neighborhood quality for high-dimensional data, for example, texts, by adopting generative models.
We propose a progressive approximation approach that refines the neighborhood of a to-be-explained decision with a careful two-stage approach.
arXiv Detail & Related papers (2023-02-11T11:42:39Z) - Manifold learning-based polynomial chaos expansions for high-dimensional
surrogate models [0.0]
We introduce a manifold learning-based method for uncertainty quantification (UQ) in describing systems.
The proposed method is able to achieve highly accurate approximations which ultimately lead to the significant acceleration of UQ tasks.
arXiv Detail & Related papers (2021-07-21T00:24:15Z) - Manifold Hypothesis in Data Analysis: Double Geometrically-Probabilistic
Approach to Manifold Dimension Estimation [92.81218653234669]
We present new approach to manifold hypothesis checking and underlying manifold dimension estimation.
Our geometrical method is a modification for sparse data of a well-known box-counting algorithm for Minkowski dimension calculation.
Experiments on real datasets show that the suggested approach based on two methods combination is powerful and effective.
arXiv Detail & Related papers (2021-07-08T15:35:54Z) - Improving Metric Dimensionality Reduction with Distributed Topology [68.8204255655161]
DIPOLE is a dimensionality-reduction post-processing step that corrects an initial embedding by minimizing a loss functional with both a local, metric term and a global, topological term.
We observe that DIPOLE outperforms popular methods like UMAP, t-SNE, and Isomap on a number of popular datasets.
arXiv Detail & Related papers (2021-06-14T17:19:44Z) - Rectangular Flows for Manifold Learning [38.63646804834534]
Normalizing flows are invertible neural networks with tractable change-of-volume terms.
Data of interest is typically assumed to live in some (often unknown) low-dimensional manifold embedded in high-dimensional ambient space.
We propose two methods to tractably the gradient of this term with respect to the parameters of the model.
arXiv Detail & Related papers (2021-06-02T18:30:39Z) - Out-of-Manifold Regularization in Contextual Embedding Space for Text
Classification [22.931314501371805]
We propose a new approach to finding and regularizing the remainder of the space, referred to as out-of-manifold.
We synthesize the out-of-manifold embeddings based on two embeddings obtained from actually-observed words.
A discriminator is trained to detect whether an input embedding is located inside the manifold or not, and simultaneously, a generator is optimized to produce new embeddings that can be easily identified as out-of-manifold.
arXiv Detail & Related papers (2021-05-14T10:17:59Z) - Deep Shells: Unsupervised Shape Correspondence with Optimal Transport [52.646396621449]
We propose a novel unsupervised learning approach to 3D shape correspondence.
We show that the proposed method significantly improves over the state-of-the-art on multiple datasets.
arXiv Detail & Related papers (2020-10-28T22:24:07Z) - Closed-Form Factorization of Latent Semantics in GANs [65.42778970898534]
A rich set of interpretable dimensions has been shown to emerge in the latent space of the Generative Adversarial Networks (GANs) trained for synthesizing images.
In this work, we examine the internal representation learned by GANs to reveal the underlying variation factors in an unsupervised manner.
We propose a closed-form factorization algorithm for latent semantic discovery by directly decomposing the pre-trained weights.
arXiv Detail & Related papers (2020-07-13T18:05:36Z) - Manifold Learning via Manifold Deflation [105.7418091051558]
dimensionality reduction methods provide a valuable means to visualize and interpret high-dimensional data.
Many popular methods can fail dramatically, even on simple two-dimensional Manifolds.
This paper presents an embedding method for a novel, incremental tangent space estimator that incorporates global structure as coordinates.
Empirically, we show our algorithm recovers novel and interesting embeddings on real-world and synthetic datasets.
arXiv Detail & Related papers (2020-07-07T10:04:28Z) - Learning Flat Latent Manifolds with VAEs [16.725880610265378]
We propose an extension to the framework of variational auto-encoders, where the Euclidean metric is a proxy for the similarity between data points.
We replace the compact prior typically used in variational auto-encoders with a recently presented, more expressive hierarchical one.
We evaluate our method on a range of data-sets, including a video-tracking benchmark.
arXiv Detail & Related papers (2020-02-12T09:54:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.