Complex-valued embeddings of generic proximity data
- URL: http://arxiv.org/abs/2008.13454v1
- Date: Mon, 31 Aug 2020 09:40:30 GMT
- Title: Complex-valued embeddings of generic proximity data
- Authors: Maximilian M\"unch and Michiel Straat and Michael Biehl and
Frank-Michael Schleif
- Abstract summary: Proximities are at the heart of almost all machine learning methods.
We propose a complex-valued vector embedding of proximity data.
The complex-valued data can serve as an input to complex-valued machine learning algorithms.
- Score: 0.6117371161379209
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Proximities are at the heart of almost all machine learning methods. If the
input data are given as numerical vectors of equal lengths, euclidean distance,
or a Hilbertian inner product is frequently used in modeling algorithms. In a
more generic view, objects are compared by a (symmetric) similarity or
dissimilarity measure, which may not obey particular mathematical properties.
This renders many machine learning methods invalid, leading to convergence
problems and the loss of guarantees, like generalization bounds. In many cases,
the preferred dissimilarity measure is not metric, like the earth mover
distance, or the similarity measure may not be a simple inner product in a
Hilbert space but in its generalization a Krein space. If the input data are
non-vectorial, like text sequences, proximity-based learning is used or ngram
embedding techniques can be applied. Standard embeddings lead to the desired
fixed-length vector encoding, but are costly and have substantial limitations
in preserving the original data's full information. As an information
preserving alternative, we propose a complex-valued vector embedding of
proximity data. This allows suitable machine learning algorithms to use these
fixed-length, complex-valued vectors for further processing. The complex-valued
data can serve as an input to complex-valued machine learning algorithms. In
particular, we address supervised learning and use extensions of
prototype-based learning. The proposed approach is evaluated on a variety of
standard benchmarks and shows strong performance compared to traditional
techniques in processing non-metric or non-psd proximity data.
Related papers
- Revisiting Evaluation Metrics for Semantic Segmentation: Optimization
and Evaluation of Fine-grained Intersection over Union [113.20223082664681]
We propose the use of fine-grained mIoUs along with corresponding worst-case metrics.
These fine-grained metrics offer less bias towards large objects, richer statistical information, and valuable insights into model and dataset auditing.
Our benchmark study highlights the necessity of not basing evaluations on a single metric and confirms that fine-grained mIoUs reduce the bias towards large objects.
arXiv Detail & Related papers (2023-10-30T03:45:15Z) - CORE: Common Random Reconstruction for Distributed Optimization with
Provable Low Communication Complexity [110.50364486645852]
Communication complexity has become a major bottleneck for speeding up training and scaling up machine numbers.
We propose Common Om REOm, which can be used to compress information transmitted between machines.
arXiv Detail & Related papers (2023-09-23T08:45:27Z) - Contrastive Learning as Kernel Approximation [0.0]
This thesis provides an overview of the current theoretical understanding of contrastive learning.
We highlight popular contrastive loss functions whose minimizers implicitly approximate a positive semidefinite (PSD) kernel.
arXiv Detail & Related papers (2023-09-06T01:25:30Z) - Linearized Wasserstein dimensionality reduction with approximation
guarantees [65.16758672591365]
LOT Wassmap is a computationally feasible algorithm to uncover low-dimensional structures in the Wasserstein space.
We show that LOT Wassmap attains correct embeddings and that the quality improves with increased sample size.
We also show how LOT Wassmap significantly reduces the computational cost when compared to algorithms that depend on pairwise distance computations.
arXiv Detail & Related papers (2023-02-14T22:12:16Z) - Combining Varied Learners for Binary Classification using Stacked
Generalization [3.1871776847712523]
This paper performs binary classification using Stacked Generalization on high dimensional Polycystic Ovary Syndrome dataset.
The various metrics are given in this paper that also point out a subtle transgression found with Receiver Operating Characteristic Curve that was proved to be incorrect.
arXiv Detail & Related papers (2022-02-17T21:47:52Z) - Estimating leverage scores via rank revealing methods and randomization [50.591267188664666]
We study algorithms for estimating the statistical leverage scores of rectangular dense or sparse matrices of arbitrary rank.
Our approach is based on combining rank revealing methods with compositions of dense and sparse randomized dimensionality reduction transforms.
arXiv Detail & Related papers (2021-05-23T19:21:55Z) - Unsupervised Ground Metric Learning using Wasserstein Eigenvectors [0.0]
Key bottleneck is design of a "ground" cost which should be adapted to the task under study.
In this paper, we propose for the first time a canonical answer by computing the ground cost as a positive eigenvector of the function mapping a cost to the pairwise OT distances between the inputs.
We also introduce a scalable computational method using entropic regularization, which operates a principal component analysis dimensionality reduction.
arXiv Detail & Related papers (2021-02-11T21:32:59Z) - Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature
Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization.
We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z) - Anchor & Transform: Learning Sparse Embeddings for Large Vocabularies [60.285091454321055]
We design a simple and efficient embedding algorithm that learns a small set of anchor embeddings and a sparse transformation matrix.
On text classification, language modeling, and movie recommendation benchmarks, we show that ANT is particularly suitable for large vocabulary sizes.
arXiv Detail & Related papers (2020-03-18T13:07:51Z) - An Advance on Variable Elimination with Applications to Tensor-Based
Computation [11.358487655918676]
We present new results on the classical algorithm of variable elimination, which underlies many algorithms including for probabilistic inference.
The results relate to exploiting functional dependencies, allowing one to perform inference and learning efficiently on models that have very large treewidth.
arXiv Detail & Related papers (2020-02-21T14:17:44Z) - Multiple Metric Learning for Structured Data [0.0]
We address the problem of merging graph and feature-space information while learning a metric from structured data.
We propose a new graph-based technique for optimizing under metric constraints.
arXiv Detail & Related papers (2020-02-13T19:11:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.