Fast Dual Subgradient Optimization of the Integrated Transportation
Distance Between Stochastic Kernels
- URL: http://arxiv.org/abs/2312.01432v1
- Date: Sun, 3 Dec 2023 15:44:17 GMT
- Title: Fast Dual Subgradient Optimization of the Integrated Transportation
Distance Between Stochastic Kernels
- Authors: Zhengqi Lin and Andrzej Ruszczynski
- Abstract summary: A generalization of the Wasserstein metric, the integrated transportation distance, establishes a novel distance between probability kernels of Markov systems.
This metric serves as the foundation for an efficient approximation technique, enabling the replacement of the original system's kernel with a kernel with a discrete support of limited cardinality.
We present a specialized dual algorithm capable of constructing these approximate kernels quickly and efficiently, without requiring computationally expensive matrix operations.
- Score: 1.5229257192293204
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A generalization of the Wasserstein metric, the integrated transportation
distance, establishes a novel distance between probability kernels of Markov
systems. This metric serves as the foundation for an efficient approximation
technique, enabling the replacement of the original system's kernel with a
kernel with a discrete support of limited cardinality. To facilitate practical
implementation, we present a specialized dual algorithm capable of constructing
these approximate kernels quickly and efficiently, without requiring
computationally expensive matrix operations. Finally, we demonstrate the
efficacy of our method through several illustrative examples, showcasing its
utility in practical scenarios. This advancement offers new possibilities for
the streamlined analysis and manipulation of stochastic systems represented by
kernels.
Related papers
- Fast and Scalable Multi-Kernel Encoder Classifier [4.178980693837599]
The proposed method facilitates fast and scalable kernel matrix embedding, and seamlessly integrates multiple kernels to enhance the learning process.
Our theoretical analysis offers a population-level characterization of this approach using random variables.
arXiv Detail & Related papers (2024-06-04T10:34:40Z) - Distributed Markov Chain Monte Carlo Sampling based on the Alternating
Direction Method of Multipliers [143.6249073384419]
In this paper, we propose a distributed sampling scheme based on the alternating direction method of multipliers.
We provide both theoretical guarantees of our algorithm's convergence and experimental evidence of its superiority to the state-of-the-art.
In simulation, we deploy our algorithm on linear and logistic regression tasks and illustrate its fast convergence compared to existing gradient-based methods.
arXiv Detail & Related papers (2024-01-29T02:08:40Z) - Fast Computation of Optimal Transport via Entropy-Regularized Extragradient Methods [75.34939761152587]
Efficient computation of the optimal transport distance between two distributions serves as an algorithm that empowers various applications.
This paper develops a scalable first-order optimization-based method that computes optimal transport to within $varepsilon$ additive accuracy.
arXiv Detail & Related papers (2023-01-30T15:46:39Z) - Linear Self-Attention Approximation via Trainable Feedforward Kernel [77.34726150561087]
In pursuit of faster computation, Efficient Transformers demonstrate an impressive variety of approaches.
We aim to expand the idea of trainable kernel methods to approximate the self-attention mechanism of the Transformer architecture.
arXiv Detail & Related papers (2022-11-08T08:14:11Z) - Linear Time Kernel Matrix Approximation via Hyperspherical Harmonics [3.24890820102255]
We propose a new technique for constructing low-rank approximations of matrices that arise in kernel methods for machine learning.
Our approach pairs a novel automatically constructed analytic expansion of the underlying kernel function with a data-dependent compression step to further optimize the approximation.
Experimental results show our approach compares favorably to the commonly used Nystrom method with respect to both accuracy for a given rank and computational time for a given accuracy across a variety of kernels, dimensions, and datasets.
arXiv Detail & Related papers (2022-02-08T05:19:39Z) - Optimization on manifolds: A symplectic approach [127.54402681305629]
We propose a dissipative extension of Dirac's theory of constrained Hamiltonian systems as a general framework for solving optimization problems.
Our class of (accelerated) algorithms are not only simple and efficient but also applicable to a broad range of contexts.
arXiv Detail & Related papers (2021-07-23T13:43:34Z) - Scalable Variational Gaussian Processes via Harmonic Kernel
Decomposition [54.07797071198249]
We introduce a new scalable variational Gaussian process approximation which provides a high fidelity approximation while retaining general applicability.
We demonstrate that, on a range of regression and classification problems, our approach can exploit input space symmetries such as translations and reflections.
Notably, our approach achieves state-of-the-art results on CIFAR-10 among pure GP models.
arXiv Detail & Related papers (2021-06-10T18:17:57Z) - Byzantine-Resilient Non-Convex Stochastic Gradient Descent [61.6382287971982]
adversary-resilient distributed optimization, in which.
machines can independently compute gradients, and cooperate.
Our algorithm is based on a new concentration technique, and its sample complexity.
It is very practical: it improves upon the performance of all prior methods when no.
setting machines are present.
arXiv Detail & Related papers (2020-12-28T17:19:32Z) - Towards a Unified Quadrature Framework for Large-Scale Kernel Machines [35.32894170512829]
We develop a quadrature framework for large-scale kernel machines via a numerical integration representation.
We leverage fully symmetric interpolatory rules to efficiently compute quadrature nodes and associated weights for kernel approximation.
arXiv Detail & Related papers (2020-11-03T12:48:07Z) - Fast Learning in Reproducing Kernel Krein Spaces via Signed Measures [31.986482149142503]
We cast this question as a distribution view by introducing the emphsigned measure
A series of non-PD kernels can be associated with the linear combination of specific finite Borel measures.
Specifically, this solution is also computationally implementable in practice to scale non-PD kernels in large sample cases.
arXiv Detail & Related papers (2020-05-30T12:10:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.