Related papers: Displacement-Sparse Neural Optimal Transport

Displacement-Sparse Neural Optimal Transport

URL: http://arxiv.org/abs/2502.01889v1
Date: Mon, 03 Feb 2025 23:44:17 GMT
Title: Displacement-Sparse Neural Optimal Transport
Authors: Peter Chen, Yue Xie, Qingpeng Zhang,
Abstract summary: Optimal Transport (OT) theory seeks to determine the map $T:X to Y$ that transports a source measure $P to a target measure $Q$.<n>We introduce a sparsity penalty into the minimax Wasserstein formulation, promote sparsity in displacement vectors $Delta(mathbfx) := T(mathbfx) := T(mathbfx) := T(mathbfx) := T(mathbfx) := T(mathbf
Score: 6.968698312185846
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Optimal Transport (OT) theory seeks to determine the map $T:X \to Y$ that transports a source measure $P$ to a target measure $Q$, minimizing the cost $c(\mathbf{x}, T(\mathbf{x}))$ between $\mathbf{x}$ and its image $T(\mathbf{x})$. Building upon the Input Convex Neural Network OT solver and incorporating the concept of displacement-sparse maps, we introduce a sparsity penalty into the minimax Wasserstein formulation, promote sparsity in displacement vectors $\Delta(\mathbf{x}) := T(\mathbf{x}) - \mathbf{x}$, and enhance the interpretability of the resulting map. However, increasing sparsity often reduces feasibility, causing $T_{\#}(P)$ to deviate more significantly from the target measure. In low-dimensional settings, we propose a heuristic framework to balance the trade-off between sparsity and feasibility by dynamically adjusting the sparsity intensity parameter during training. For high-dimensional settings, we directly constrain the dimensionality of displacement vectors by enforcing $\dim(\Delta(\mathbf{x})) \leq l$, where $l < d$ for $X \subseteq \mathbb{R}^d$. Among maps satisfying this constraint, we aim to identify the most feasible one. This goal can be effectively achieved by adapting our low-dimensional heuristic framework without resorting to dimensionality reduction. We validate our method on both synthesized sc-RNA and real 4i cell perturbation datasets, demonstrating improvements over existing methods.

Related papers

Two-Timescale Gradient Descent Ascent Algorithms for Nonconvex Minimax Optimization [77.3396841985172]
We provide a unified analysis of two-timescale gradient ascent (TTGDA) for solving structured non minimax optimization problems.<n>Our contribution is to design TTGDA algorithms are effective beyond the setting.
arXiv Detail & Related papers (2024-08-21T20:14:54Z)
Progressive Entropic Optimal Transport Solvers [33.821924561619895]
We propose a new class of EOT solvers (ProgOT) that can estimate both plans and transport maps. We provide experimental evidence demonstrating that ProgOT is a faster and more robust alternative to standard solvers. We also prove statistical consistency of our approach for estimating optimal transport maps.
arXiv Detail & Related papers (2024-06-07T16:33:08Z)
Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit [75.4661041626338]
We study the problem of gradient descent learning of a single-index target function $f_*(boldsymbolx) = textstylesigma_*left(langleboldsymbolx,boldsymbolthetarangleright)$<n>We prove that a two-layer neural network optimized by an SGD-based algorithm learns $f_*$ with a complexity that is not governed by information exponents.
arXiv Detail & Related papers (2024-06-03T17:56:58Z)
Improving Generalization of Deep Neural Networks by Optimum Shifting [33.092571599896814]
We propose a novel method called emphoptimum shifting, which changes the parameters of a neural network from a sharp minimum to a flatter one. Our method is based on the observation that when the input and output of a neural network are fixed, the matrix multiplications within the network can be treated as systems of under-determined linear equations.
arXiv Detail & Related papers (2024-05-23T02:31:55Z)
A Specialized Semismooth Newton Method for Kernel-Based Optimal Transport [92.96250725599958]
Kernel-based optimal transport (OT) estimators offer an alternative, functional estimation procedure to address OT problems from samples. We show that our SSN method achieves a global convergence rate of $O (1/sqrtk)$, and a local quadratic convergence rate under standard regularity conditions.
arXiv Detail & Related papers (2023-10-21T18:48:45Z)
Efficient and Flexible Neural Network Training through Layer-wise Feedback Propagation [49.44309457870649]
Layer-wise Feedback feedback (LFP) is a novel training principle for neural network-like predictors.<n>LFP decomposes a reward to individual neurons based on their respective contributions.<n>Our method then implements a greedy reinforcing approach helpful parts of the network and weakening harmful ones.
arXiv Detail & Related papers (2023-08-23T10:48:28Z)
Globally Optimal Training of Neural Networks with Threshold Activation Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations. We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z)
NeuralStagger: Accelerating Physics-constrained Neural PDE Solver with Spatial-temporal Decomposition [67.46012350241969]
This paper proposes a general acceleration methodology called NeuralStagger. It decomposing the original learning tasks into several coarser-resolution subtasks. We demonstrate the successful application of NeuralStagger on 2D and 3D fluid dynamics simulations.
arXiv Detail & Related papers (2023-02-20T19:36:52Z)
The Monge Gap: A Regularizer to Learn All Transport Maps [34.81915836064636]
Brenier's theorem states that when the ground cost is the squared-Euclidean distance, the best'' map to morph a continuous measure in $mathcalP(Rd)$ into another must be the gradient of a convex function. Despite their mathematical elegance, fitting OT maps with ICNNs raises many challenges. We propose a radically different approach to estimating OT maps.
arXiv Detail & Related papers (2023-02-09T21:56:11Z)
Monge, Bregman and Occam: Interpretable Optimal Transport in High-Dimensions with Feature-Sparse Maps [37.45959537338404]
We show that choosing a sparsity-inducing norm for $tau$ results in maps that apply Occam's razor to transport. We showcase the ability of our method to estimate meaningful maps for high-dimensional single-cell transcription data.
arXiv Detail & Related papers (2023-02-08T14:02:34Z)
Universal Neural Optimal Transport [0.0]
UNOT (Universal Neural Optimal Transport) is a novel framework capable of accurately predicting (entropic) OT distances and plans between discrete measures for a given cost function.<n>We show that our network can be used as a state-of-the-art initialization for the Sinkhorn algorithm with speedups of up to $7.4times$.
arXiv Detail & Related papers (2022-11-30T21:56:09Z)
Neural Optimal Transport with General Cost Functionals [66.41953045707172]
We introduce a novel neural network-based algorithm to compute optimal transport plans for general cost functionals. As an application, we construct a cost functional to map data distributions while preserving the class-wise structure.
arXiv Detail & Related papers (2022-05-30T20:00:19Z)
Minimax Optimal Quantization of Linear Models: Information-Theoretic Limits and Efficient Algorithms [59.724977092582535]
We consider the problem of quantizing a linear model learned from measurements. We derive an information-theoretic lower bound for the minimax risk under this setting. We show that our method and upper-bounds can be extended for two-layer ReLU neural networks.
arXiv Detail & Related papers (2022-02-23T02:39:04Z)
Neural Capacitance: A New Perspective of Neural Network Selection via Edge Dynamics [85.31710759801705]
Current practice requires expensive computational costs in model training for performance prediction. We propose a novel framework for neural network selection by analyzing the governing dynamics over synaptic connections (edges) during training. Our framework is built on the fact that back-propagation during neural network training is equivalent to the dynamical evolution of synaptic connections.
arXiv Detail & Related papers (2022-01-11T20:53:15Z)
Minimax Optimal Regression over Sobolev Spaces via Laplacian Regularization on Neighborhood Graphs [25.597646488273558]
We study the statistical properties of Laplacian smoothing, a graph-based approach to nonparametric regression. We prove that Laplacian smoothing is manifold-adaptive.
arXiv Detail & Related papers (2021-06-03T01:20:41Z)
An efficient projection neural network for $\ell_1$-regularized logistic regression [10.517079029721257]
This paper presents a simple projection neural network for $ell_$-regularized logistics regression. The proposed neural network does not require any extra auxiliary variable nor any smooth approximation. We also investigate the convergence of the proposed neural network by using the Lyapunov theory and show that it converges to a solution of the problem with any arbitrary initial value.
arXiv Detail & Related papers (2021-05-12T06:13:44Z)
Nonparametric Learning of Two-Layer ReLU Residual Units [22.870658194212744]
We describe an algorithm that learns two-layer residual units with rectified linear unit (ReLU) activation. We design layer-wise objectives as functionals whose analytic minimizers express the exact ground-truth network in terms of its parameters and nonlinearities. We prove the statistical strong consistency of our algorithm, and demonstrate the robustness and sample efficiency of our algorithm by experiments.
arXiv Detail & Related papers (2020-08-17T22:11:26Z)
A deep network construction that adapts to intrinsic dimensionality beyond the domain [79.23797234241471]
We study the approximation of two-layer compositions $f(x) = g(phi(x))$ via deep networks with ReLU activation. We focus on two intuitive and practically relevant choices for $phi$: the projection onto a low-dimensional embedded submanifold and a distance to a collection of low-dimensional sets.
arXiv Detail & Related papers (2020-08-06T09:50:29Z)
Agnostic Learning of a Single Neuron with Gradient Descent [92.7662890047311]
We consider the problem of learning the best-fitting single neuron as measured by the expected square loss. For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$. For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
arXiv Detail & Related papers (2020-05-29T07:20:35Z)
Regularized Optimal Transport is Ground Cost Adversarial [34.81915836064636]
We show that regularization of the optimal transport problem can be interpreted as ground cost adversarial. This gives access to a robust dissimilarity measure on the ground space, which can in turn be used in other applications.
arXiv Detail & Related papers (2020-02-10T17:28:35Z)
Naive Exploration is Optimal for Online LQR [49.681825576239355]
We show that the optimal regret scales as $widetildeTheta(sqrtd_mathbfu2 d_mathbfx T)$, where $T$ is the number of time steps, $d_mathbfu$ is the dimension of the input space, and $d_mathbfx$ is the dimension of the system state. Our lower bounds rule out the possibility of a $mathrmpoly(logT)$-regret algorithm, which had been
arXiv Detail & Related papers (2020-01-27T03:44:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.