Fourier Sliced-Wasserstein Embedding for Multisets and Measures
- URL: http://arxiv.org/abs/2405.16519v2
- Date: Sun, 29 Sep 2024 22:36:41 GMT
- Title: Fourier Sliced-Wasserstein Embedding for Multisets and Measures
- Authors: Tal Amir, Nadav Dym,
- Abstract summary: We present a novel method to embed multisets and measures over $mathbbRd$ into Euclidean space.
We show that our method yields superior representations of input multisets and offers practical advantage for learning on multiset data.
- Score: 3.396731589928944
- License:
- Abstract: We present the $\textit{Fourier Sliced Wasserstein (FSW) embedding}\unicode{x2014}$a novel method to embed multisets and measures over $\mathbb{R}^d$ into Euclidean space. Our proposed embedding approximately preserves the sliced Wasserstein distance on distributions, thereby yielding geometrically meaningful representations that better capture the structure of the input. Moreover, it is injective on measures and $\textit{bi-Lipschitz}$ on multisets$\unicode{x2014}$a significant advantage over prevalent embedding methods based on sum- or max-pooling, which are provably not bi-Lipschitz, and in many cases, not even injective. The required output dimension for these guarantees is near optimal: roughly $2 n d$, where $n$ is the maximal number of support points in the input. Conversely, we prove that it is $\textit{impossible}$ to embed distributions over $\mathbb{R}^d$ into Euclidean space in a bi-Lipschitz manner. Thus, the metric properties of our embedding are, in a sense, the best achievable. Through numerical experiments, we demonstrate that our method yields superior representations of input multisets and offers practical advantage for learning on multiset data. Specifically, we show that (a) the FSW embedding induces significantly lower distortion on the space of multisets, compared to the leading method for computing sliced-Wasserstein-preserving embeddings; and (b) a simple combination of the FSW embedding and an MLP achieves state-of-the-art performance in learning the (non-sliced) Wasserstein distance.
Related papers
- Optimal Bound for PCA with Outliers using Higher-Degree Voronoi Diagrams [0.0]
We introduce new algorithms for Principal Component Analysis (PCA) with outliers.
We navigate to the optimal subspace for PCA even in the presence of outliers.
This approach achieves an optimal solution with a time complexity of $nd+mathcalO(1)textpoly(n,d)$.
arXiv Detail & Related papers (2024-08-13T13:05:36Z) - Projection by Convolution: Optimal Sample Complexity for Reinforcement Learning in Continuous-Space MDPs [56.237917407785545]
We consider the problem of learning an $varepsilon$-optimal policy in a general class of continuous-space Markov decision processes (MDPs) having smooth Bellman operators.
Key to our solution is a novel projection technique based on ideas from harmonic analysis.
Our result bridges the gap between two popular but conflicting perspectives on continuous-space MDPs.
arXiv Detail & Related papers (2024-05-10T09:58:47Z) - A quasi-polynomial time algorithm for Multi-Dimensional Scaling via LP hierarchies [34.7582575446942]
Multi-dimensional Scaling (MDS) is a family of methods for embedding an $n$-point metric into low-dimensional Euclidean space.
We give the first approximation algorithm for MDS with quasi-polynomial dependency on $Delta$.
arXiv Detail & Related papers (2023-11-29T17:42:05Z) - Polynomial Width is Sufficient for Set Representation with
High-dimensional Features [69.65698500919869]
DeepSets is the most widely used neural network architecture for set representation.
We present two set-element embedding layers: (a) linear + power activation (LP) and (b) linear + exponential activations (LE)
arXiv Detail & Related papers (2023-07-08T16:00:59Z) - Small Transformers Compute Universal Metric Embeddings [25.004650816730543]
We derive embedding guarantees for feature maps implemented by small neural networks.
We prove that a transformer of depth about $nlog(n)$ and width about $n2$ can embed any $n$-point dataset from $mathcalX$ with low metric distortion.
arXiv Detail & Related papers (2022-09-14T17:12:41Z) - A Law of Robustness beyond Isoperimetry [84.33752026418045]
We prove a Lipschitzness lower bound $Omega(sqrtn/p)$ of robustness of interpolating neural network parameters on arbitrary distributions.
We then show the potential benefit of overparametrization for smooth data when $n=mathrmpoly(d)$.
We disprove the potential existence of an $O(1)$-Lipschitz robust interpolating function when $n=exp(omega(d))$.
arXiv Detail & Related papers (2022-02-23T16:10:23Z) - Minimax Optimal Quantization of Linear Models: Information-Theoretic
Limits and Efficient Algorithms [59.724977092582535]
We consider the problem of quantizing a linear model learned from measurements.
We derive an information-theoretic lower bound for the minimax risk under this setting.
We show that our method and upper-bounds can be extended for two-layer ReLU neural networks.
arXiv Detail & Related papers (2022-02-23T02:39:04Z) - TURF: A Two-factor, Universal, Robust, Fast Distribution Learning
Algorithm [64.13217062232874]
One of its most powerful and successful modalities approximates every distribution to an $ell$ distance essentially at most a constant times larger than its closest $t$-piece degree-$d_$.
We provide a method that estimates this number near-optimally, hence helps approach the best possible approximation.
arXiv Detail & Related papers (2022-02-15T03:49:28Z) - Universal consistency of Wasserstein $k$-NN classifier: Negative and
Positive Results [0.0]
Wasserstein distance provides a notion of dissimilarities between probability measures.
We show that the $k$-NN classifier is not universally consistent on the space of measures supported in $(0,1)$.
arXiv Detail & Related papers (2020-09-10T03:05:05Z) - Robustly Learning any Clusterable Mixture of Gaussians [55.41573600814391]
We study the efficient learnability of high-dimensional Gaussian mixtures in the adversarial-robust setting.
We provide an algorithm that learns the components of an $epsilon$-corrupted $k$-mixture within information theoretically near-optimal error proofs of $tildeO(epsilon)$.
Our main technical contribution is a new robust identifiability proof clusters from a Gaussian mixture, which can be captured by the constant-degree Sum of Squares proof system.
arXiv Detail & Related papers (2020-05-13T16:44:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.