Fourier Sliced-Wasserstein Embedding for Multisets and Measures
- URL: http://arxiv.org/abs/2405.16519v2
- Date: Sun, 29 Sep 2024 22:36:41 GMT
- Title: Fourier Sliced-Wasserstein Embedding for Multisets and Measures
- Authors: Tal Amir, Nadav Dym,
- Abstract summary: We present a novel method to embed multisets and measures over $mathbbRd$ into Euclidean space.
We show that our method yields superior representations of input multisets and offers practical advantage for learning on multiset data.
- Score: 3.396731589928944
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present the $\textit{Fourier Sliced Wasserstein (FSW) embedding}\unicode{x2014}$a novel method to embed multisets and measures over $\mathbb{R}^d$ into Euclidean space. Our proposed embedding approximately preserves the sliced Wasserstein distance on distributions, thereby yielding geometrically meaningful representations that better capture the structure of the input. Moreover, it is injective on measures and $\textit{bi-Lipschitz}$ on multisets$\unicode{x2014}$a significant advantage over prevalent embedding methods based on sum- or max-pooling, which are provably not bi-Lipschitz, and in many cases, not even injective. The required output dimension for these guarantees is near optimal: roughly $2 n d$, where $n$ is the maximal number of support points in the input. Conversely, we prove that it is $\textit{impossible}$ to embed distributions over $\mathbb{R}^d$ into Euclidean space in a bi-Lipschitz manner. Thus, the metric properties of our embedding are, in a sense, the best achievable. Through numerical experiments, we demonstrate that our method yields superior representations of input multisets and offers practical advantage for learning on multiset data. Specifically, we show that (a) the FSW embedding induces significantly lower distortion on the space of multisets, compared to the leading method for computing sliced-Wasserstein-preserving embeddings; and (b) a simple combination of the FSW embedding and an MLP achieves state-of-the-art performance in learning the (non-sliced) Wasserstein distance.
Related papers
- Fourier Sliced-Wasserstein Embedding for Multisets and Measures [3.396731589928944]
We present a novel method to embed multisets and measures over $mathbbRd$ into Euclidean space.
We prove that it is impossible to embed distributions over $mathbbRd$ into Euclidean space in a bi-Lipschitz manner.
arXiv Detail & Related papers (2025-04-03T12:51:40Z) - Riemannian Optimization on Relaxed Indicator Matrix Manifold [83.13494760649874]
The indicator matrix plays an important role in machine learning, but optimizing it is an NP-hard problem.
We propose a new relaxation of the indicator matrix and prove that this relaxation forms a manifold, which we call the Relaxed Indicator Matrix Manifold (RIM manifold)
We provide several methods of Retraction, including a fast Retraction method to obtain geodesics.
arXiv Detail & Related papers (2025-03-26T12:45:52Z) - Joint Metric Space Embedding by Unbalanced OT with Gromov-Wasserstein Marginal Penalization [3.7498611358320733]
We propose a new approach for unsupervised alignment of heterogeneous datasets.
Our method is based on an unbalanced optimal transport problem with Gromov-Wasserstein marginalization.
arXiv Detail & Related papers (2025-02-11T12:28:47Z) - Optimal Bound for PCA with Outliers using Higher-Degree Voronoi Diagrams [0.0]
We introduce new algorithms for Principal Component Analysis (PCA) with outliers.
We navigate to the optimal subspace for PCA even in the presence of outliers.
This approach achieves an optimal solution with a time complexity of $nd+mathcalO(1)textpoly(n,d)$.
arXiv Detail & Related papers (2024-08-13T13:05:36Z) - Projection by Convolution: Optimal Sample Complexity for Reinforcement Learning in Continuous-Space MDPs [56.237917407785545]
We consider the problem of learning an $varepsilon$-optimal policy in a general class of continuous-space Markov decision processes (MDPs) having smooth Bellman operators.
Key to our solution is a novel projection technique based on ideas from harmonic analysis.
Our result bridges the gap between two popular but conflicting perspectives on continuous-space MDPs.
arXiv Detail & Related papers (2024-05-10T09:58:47Z) - A quasi-polynomial time algorithm for Multi-Dimensional Scaling via LP hierarchies [34.7582575446942]
Multi-dimensional Scaling (MDS) is a family of methods for embedding an $n$-point metric into low-dimensional Euclidean space.
We give the first approximation algorithm for MDS with quasi-polynomial dependency on $Delta$.
arXiv Detail & Related papers (2023-11-29T17:42:05Z) - Polynomial Width is Sufficient for Set Representation with
High-dimensional Features [69.65698500919869]
DeepSets is the most widely used neural network architecture for set representation.
We present two set-element embedding layers: (a) linear + power activation (LP) and (b) linear + exponential activations (LE)
arXiv Detail & Related papers (2023-07-08T16:00:59Z) - The Implicit Bias of Minima Stability in Multivariate Shallow ReLU
Networks [53.95175206863992]
We study the type of solutions to which gradient descent converges when used to train a single hidden-layer multivariate ReLU network with the quadratic loss.
We prove that although shallow ReLU networks are universal approximators, stable shallow networks are not.
arXiv Detail & Related papers (2023-06-30T09:17:39Z) - Small Transformers Compute Universal Metric Embeddings [25.004650816730543]
We derive embedding guarantees for feature maps implemented by small neural networks.
We prove that a transformer of depth about $nlog(n)$ and width about $n2$ can embed any $n$-point dataset from $mathcalX$ with low metric distortion.
arXiv Detail & Related papers (2022-09-14T17:12:41Z) - A Law of Robustness beyond Isoperimetry [84.33752026418045]
We prove a Lipschitzness lower bound $Omega(sqrtn/p)$ of robustness of interpolating neural network parameters on arbitrary distributions.
We then show the potential benefit of overparametrization for smooth data when $n=mathrmpoly(d)$.
We disprove the potential existence of an $O(1)$-Lipschitz robust interpolating function when $n=exp(omega(d))$.
arXiv Detail & Related papers (2022-02-23T16:10:23Z) - Minimax Optimal Quantization of Linear Models: Information-Theoretic
Limits and Efficient Algorithms [59.724977092582535]
We consider the problem of quantizing a linear model learned from measurements.
We derive an information-theoretic lower bound for the minimax risk under this setting.
We show that our method and upper-bounds can be extended for two-layer ReLU neural networks.
arXiv Detail & Related papers (2022-02-23T02:39:04Z) - TURF: A Two-factor, Universal, Robust, Fast Distribution Learning
Algorithm [64.13217062232874]
One of its most powerful and successful modalities approximates every distribution to an $ell$ distance essentially at most a constant times larger than its closest $t$-piece degree-$d_$.
We provide a method that estimates this number near-optimally, hence helps approach the best possible approximation.
arXiv Detail & Related papers (2022-02-15T03:49:28Z) - Nystr\"om Kernel Mean Embeddings [92.10208929236826]
We propose an efficient approximation procedure based on the Nystr"om method.
It yields sufficient conditions on the subsample size to obtain the standard $n-1/2$ rate.
We discuss applications of this result for the approximation of the maximum mean discrepancy and quadrature rules.
arXiv Detail & Related papers (2022-01-31T08:26:06Z) - Reward-Free RL is No Harder Than Reward-Aware RL in Linear Markov
Decision Processes [61.11090361892306]
Reward-free reinforcement learning (RL) considers the setting where the agent does not have access to a reward function during exploration.
We show that this separation does not exist in the setting of linear MDPs.
We develop a computationally efficient algorithm for reward-free RL in a $d$-dimensional linear MDP.
arXiv Detail & Related papers (2022-01-26T22:09:59Z) - Robust Multi-view Registration of Point Sets with Laplacian Mixture
Model [25.865100974015412]
We propose a novel probabilistic generative method to align multiple point sets based on the heavy-tailed Laplacian distribution.
We demonstrate the advantages of our method by comparing it with representative state-of-the-art approaches on benchmark challenging data sets.
arXiv Detail & Related papers (2021-10-26T14:49:09Z) - Depth-based pseudo-metrics between probability distributions [1.1470070927586016]
We propose two new pseudo-metrics between continuous probability measures based on data depth and its associated central regions.
In contrast to the Wasserstein distance, the proposed pseudo-metrics do not suffer from the curse of dimensionality.
The regions-based pseudo-metric appears to be robust w.r.t. both outliers and heavy tails.
arXiv Detail & Related papers (2021-03-23T17:33:18Z) - Simple and Near-Optimal MAP Inference for Nonsymmetric DPPs [3.3504365823045044]
We study the problem of maximum a posteriori (MAP) inference for determinantal point processes defined by a nonsymmetric kernel matrix.
We obtain the first multiplicative approximation guarantee for this problem using local search.
Our approximation factor of $kO(k)$ is nearly tight, and we show theoretically and experimentally that it compares favorably to the state-of-the-art methods for this problem.
arXiv Detail & Related papers (2021-02-10T09:34:44Z) - Hybrid Trilinear and Bilinear Programming for Aligning Partially
Overlapping Point Sets [85.71360365315128]
In many applications, we need algorithms which can align partially overlapping point sets are invariant to the corresponding corresponding RPM algorithm.
We first show that the objective is a cubic bound function. We then utilize the convex envelopes of trilinear and bilinear monomial transformations to derive its lower bound.
We next develop a branch-and-bound (BnB) algorithm which only branches over the transformation variables and runs efficiently.
arXiv Detail & Related papers (2021-01-19T04:24:23Z) - Universal consistency of Wasserstein $k$-NN classifier: Negative and
Positive Results [0.0]
Wasserstein distance provides a notion of dissimilarities between probability measures.
We show that the $k$-NN classifier is not universally consistent on the space of measures supported in $(0,1)$.
arXiv Detail & Related papers (2020-09-10T03:05:05Z) - On Projection Robust Optimal Transport: Sample Complexity and Model
Misspecification [101.0377583883137]
Projection robust (PR) OT seeks to maximize the OT cost between two measures by choosing a $k$-dimensional subspace onto which they can be projected.
Our first contribution is to establish several fundamental statistical properties of PR Wasserstein distances.
Next, we propose the integral PR Wasserstein (IPRW) distance as an alternative to the PRW distance, by averaging rather than optimizing on subspaces.
arXiv Detail & Related papers (2020-06-22T14:35:33Z) - Robustly Learning any Clusterable Mixture of Gaussians [55.41573600814391]
We study the efficient learnability of high-dimensional Gaussian mixtures in the adversarial-robust setting.
We provide an algorithm that learns the components of an $epsilon$-corrupted $k$-mixture within information theoretically near-optimal error proofs of $tildeO(epsilon)$.
Our main technical contribution is a new robust identifiability proof clusters from a Gaussian mixture, which can be captured by the constant-degree Sum of Squares proof system.
arXiv Detail & Related papers (2020-05-13T16:44:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.