A Unified Framework for Implicit Sinkhorn Differentiation
- URL: http://arxiv.org/abs/2205.06688v1
- Date: Fri, 13 May 2022 14:45:31 GMT
- Title: A Unified Framework for Implicit Sinkhorn Differentiation
- Authors: Marvin Eisenberger, Aysim Toker, Laura Leal-Taix\'e, Florian Bernard,
Daniel Cremers
- Abstract summary: We propose an algorithm that obtains analytical gradients of a Sinkhorn layer via implicit differentiation.
We show that it is computationally more efficient, particularly when resources like GPU memory are scarce.
- Score: 58.56866763433335
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The Sinkhorn operator has recently experienced a surge of popularity in
computer vision and related fields. One major reason is its ease of integration
into deep learning frameworks. To allow for an efficient training of respective
neural networks, we propose an algorithm that obtains analytical gradients of a
Sinkhorn layer via implicit differentiation. In comparison to prior work, our
framework is based on the most general formulation of the Sinkhorn operator. It
allows for any type of loss function, while both the target capacities and cost
matrices are differentiated jointly. We further construct error bounds of the
resulting algorithm for approximate inputs. Finally, we demonstrate that for a
number of applications, simply replacing automatic differentiation with our
algorithm directly improves the stability and accuracy of the obtained
gradients. Moreover, we show that it is computationally more efficient,
particularly when resources like GPU memory are scarce.
Related papers
- Newton Losses: Using Curvature Information for Learning with Differentiable Algorithms [80.37846867546517]
We show how to train eight different neural networks with custom objectives.
We exploit their second-order information via their empirical Fisherssian matrices.
We apply Loss Lossiable algorithms to achieve significant improvements for less differentiable algorithms.
arXiv Detail & Related papers (2024-10-24T18:02:11Z) - Optimizing Tensor Computation Graphs with Equality Saturation and Monte Carlo Tree Search [0.0]
We present a tensor graph rewriting approach that uses Monte Carlo tree search to build superior representation.
Our approach improves the inference speedup of neural networks by up to 11% compared to existing methods.
arXiv Detail & Related papers (2024-10-07T22:22:02Z) - A Mirror Descent-Based Algorithm for Corruption-Tolerant Distributed Gradient Descent [57.64826450787237]
We show how to analyze the behavior of distributed gradient descent algorithms in the presence of adversarial corruptions.
We show how to use ideas from (lazy) mirror descent to design a corruption-tolerant distributed optimization algorithm.
Experiments based on linear regression, support vector classification, and softmax classification on the MNIST dataset corroborate our theoretical findings.
arXiv Detail & Related papers (2024-07-19T08:29:12Z) - A Globally Convergent Algorithm for Neural Network Parameter
Optimization Based on Difference-of-Convex Functions [29.58728073957055]
We propose an algorithm for optimizing parameters of hidden layer networks.
Specifically, we derive a blockwise (DC-of-the-art) difference function.
arXiv Detail & Related papers (2024-01-15T19:53:35Z) - Ordering for Non-Replacement SGD [7.11967773739707]
We seek to find an ordering that can improve the convergence rates for the non-replacement form of the algorithm.
We develop optimal orderings for constant and decreasing step sizes for strongly convex and convex functions.
In addition, we are able to combine the ordering with mini-batch and further apply it to more complex neural networks.
arXiv Detail & Related papers (2023-06-28T00:46:58Z) - On Model Compression for Neural Networks: Framework, Algorithm, and Convergence Guarantee [21.818773423324235]
This paper focuses on two model compression techniques: low-rank approximation and weight approximation.
In this paper, a holistic framework is proposed for model compression from a novel perspective of non optimization.
arXiv Detail & Related papers (2023-03-13T02:14:42Z) - Nonsmooth automatic differentiation: a cheap gradient principle and
other complexity results [0.0]
We provide a model to estimate the computational costs of the backward and forward modes of algorithmic differentiation for a wide class of nonsmooth programs.
Prominent examples are the famous relu and convolutional neural networks together with their standard loss functions.
arXiv Detail & Related papers (2022-06-01T08:43:35Z) - Multi-Robot Active Mapping via Neural Bipartite Graph Matching [49.72892929603187]
We study the problem of multi-robot active mapping, which aims for complete scene map construction in minimum time steps.
The key to this problem lies in the goal position estimation to enable more efficient robot movements.
We propose a novel algorithm, namely NeuralCoMapping, which takes advantage of both approaches.
arXiv Detail & Related papers (2022-03-30T14:03:17Z) - Simple Stochastic and Online Gradient DescentAlgorithms for Pairwise
Learning [65.54757265434465]
Pairwise learning refers to learning tasks where the loss function depends on a pair instances.
Online descent (OGD) is a popular approach to handle streaming data in pairwise learning.
In this paper, we propose simple and online descent to methods for pairwise learning.
arXiv Detail & Related papers (2021-11-23T18:10:48Z) - Variance Reduction with Sparse Gradients [82.41780420431205]
Variance reduction methods such as SVRG and SpiderBoost use a mixture of large and small batch gradients.
We introduce a new sparsity operator: The random-top-k operator.
Our algorithm consistently outperforms SpiderBoost on various tasks including image classification, natural language processing, and sparse matrix factorization.
arXiv Detail & Related papers (2020-01-27T08:23:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.