Neural Estimation for Scaling Entropic Multimarginal Optimal Transport
- URL: http://arxiv.org/abs/2506.00573v1
- Date: Sat, 31 May 2025 14:10:27 GMT
- Title: Neural Estimation for Scaling Entropic Multimarginal Optimal Transport
- Authors: Dor Tsur, Ziv Goldfeld, Kristjan Greenewald, Haim Permuter,
- Abstract summary: We propose a new computational framework for entropic MOT, dubbed Neural Entropic MOT (NEMOT)<n>NEMOT employs neural networks trained using mini-batches, which transfers the computational complexity from the dataset size to the size of the mini-batch, leading to substantial gains.<n>In particular, orders-of-magnitude speedups are observed relative to the state-of-the-art, with a notable increase in the feasible number of samples and marginals.
- Score: 14.389645696715599
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multimarginal optimal transport (MOT) is a powerful framework for modeling interactions between multiple distributions, yet its applicability is bottlenecked by a high computational overhead. Entropic regularization provides computational speedups via the multimarginal Sinkhorn algorithm, whose time complexity, for a dataset size $n$ and $k$ marginals, generally scales as $O(n^k)$. However, this dependence on the dataset size $n$ is computationally prohibitive for many machine learning problems. In this work, we propose a new computational framework for entropic MOT, dubbed Neural Entropic MOT (NEMOT), that enjoys significantly improved scalability. NEMOT employs neural networks trained using mini-batches, which transfers the computational complexity from the dataset size to the size of the mini-batch, leading to substantial gains. We provide formal guarantees on the accuracy of NEMOT via non-asymptotic error bounds. We supplement these with numerical results that demonstrate the performance gains of NEMOT over Sinkhorn's algorithm, as well as extensions to neural computation of multimarginal entropic Gromov-Wasserstein alignment. In particular, orders-of-magnitude speedups are observed relative to the state-of-the-art, with a notable increase in the feasible number of samples and marginals. NEMOT seamlessly integrates as a module in large-scale machine learning pipelines, and can serve to expand the practical applicability of entropic MOT for tasks involving multimarginal data.
Related papers
- Fast and close Shannon entropy approximation [0.0]
A non-singular rational approximation of Shannon entropy and its gradient achieves a mean absolute error of $10-3$.<n>FEA allows around $50%$ faster computation, requiring only $5$ to $6$ elementary computational operations.<n>On a set of common benchmarks for the feature selection problem in machine learning, we show that the combined effect of fewer elementary operations, low approximation error, and a non-singular gradient allows significantly better model quality.
arXiv Detail & Related papers (2025-05-20T11:41:26Z) - TensorGRaD: Tensor Gradient Robust Decomposition for Memory-Efficient Neural Operator Training [91.8932638236073]
We introduce textbfTensorGRaD, a novel method that directly addresses the memory challenges associated with large-structured weights.<n>We show that sparseGRaD reduces total memory usage by over $50%$ while maintaining and sometimes even improving accuracy.
arXiv Detail & Related papers (2025-01-04T20:51:51Z) - Progressive Entropic Optimal Transport Solvers [33.821924561619895]
We propose a new class of EOT solvers (ProgOT) that can estimate both plans and transport maps.
We provide experimental evidence demonstrating that ProgOT is a faster and more robust alternative to standard solvers.
We also prove statistical consistency of our approach for estimating optimal transport maps.
arXiv Detail & Related papers (2024-06-07T16:33:08Z) - Multi-Grid Tensorized Fourier Neural Operator for High-Resolution PDEs [93.82811501035569]
We introduce a new data efficient and highly parallelizable operator learning approach with reduced memory requirement and better generalization.
MG-TFNO scales to large resolutions by leveraging local and global structures of full-scale, real-world phenomena.
We demonstrate superior performance on the turbulent Navier-Stokes equations where we achieve less than half the error with over 150x compression.
arXiv Detail & Related papers (2023-09-29T20:18:52Z) - A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical
Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs)
MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z) - Tensor Slicing and Optimization for Multicore NPUs [2.670309629218727]
This paper proposes a compiler optimization pass for Multicore NPUs, called Slicing Optimization (TSO)
TSO identifies the best tensor slicing that minimizes execution time for a set of CNN models.
Results show that TSO is capable of identifying the best tensor slicing that minimizes execution time for a set of CNN models.
arXiv Detail & Related papers (2023-04-06T12:03:03Z) - Scaling up the self-optimization model by means of on-the-fly
computation of weights [0.8057006406834467]
This work introduces a novel implementation of the Self-Optimization (SO) model that scales as $mathcalOleft(N2right)$ with respect to the number of nodes $N$.
Our on-the-fly computation paves the way for investigating substantially larger system sizes, allowing for more variety and complexity in future studies.
arXiv Detail & Related papers (2022-11-03T10:51:25Z) - Efficient Dataset Distillation Using Random Feature Approximation [109.07737733329019]
We propose a novel algorithm that uses a random feature approximation (RFA) of the Neural Network Gaussian Process (NNGP) kernel.
Our algorithm provides at least a 100-fold speedup over KIP and can run on a single GPU.
Our new method, termed an RFA Distillation (RFAD), performs competitively with KIP and other dataset condensation algorithms in accuracy over a range of large-scale datasets.
arXiv Detail & Related papers (2022-10-21T15:56:13Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - On the complexity of the optimal transport problem with graph-structured
cost [9.24979291231758]
Multi-marginal optimal transport (MOT) is a generalization of optimal transport to multiple marginals.
The usage of MOT has been largely impeded by its computational complexity which scales exponentially in the number of marginals.
arXiv Detail & Related papers (2021-10-01T19:29:59Z) - Coded Stochastic ADMM for Decentralized Consensus Optimization with Edge
Computing [113.52575069030192]
Big data, including applications with high security requirements, are often collected and stored on multiple heterogeneous devices, such as mobile devices, drones and vehicles.
Due to the limitations of communication costs and security requirements, it is of paramount importance to extract information in a decentralized manner instead of aggregating data to a fusion center.
We consider the problem of learning model parameters in a multi-agent system with data locally processed via distributed edge nodes.
A class of mini-batch alternating direction method of multipliers (ADMM) algorithms is explored to develop the distributed learning model.
arXiv Detail & Related papers (2020-10-02T10:41:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.