Re-basin via implicit Sinkhorn differentiation
- URL: http://arxiv.org/abs/2212.12042v1
- Date: Thu, 22 Dec 2022 21:25:06 GMT
- Title: Re-basin via implicit Sinkhorn differentiation
- Authors: Fidel A. Guerrero Pe\~na, Heitor Rapela Medeiros, Thomas Dubail, Masih
Aminbeidokhti, Eric Granger, Marco Pedersoli
- Abstract summary: We propose a Sinkhorn re-basin network with the ability to obtain the transportation plan that better suits a given objective.
Unlike the current state-of-art, our method is differentiable and, therefore, easy to adapt to any task within the deep learning domain.
- Score: 9.827002225566073
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The recent emergence of new algorithms for permuting models into functionally
equivalent regions of the solution space has shed some light on the complexity
of error surfaces, and some promising properties like mode connectivity.
However, finding the right permutation is challenging, and current optimization
techniques are not differentiable, which makes it difficult to integrate into a
gradient-based optimization, and often leads to sub-optimal solutions. In this
paper, we propose a Sinkhorn re-basin network with the ability to obtain the
transportation plan that better suits a given objective. Unlike the current
state-of-art, our method is differentiable and, therefore, easy to adapt to any
task within the deep learning domain. Furthermore, we show the advantage of our
re-basin method by proposing a new cost function that allows performing
incremental learning by exploiting the linear mode connectivity property. The
benefit of our method is compared against similar approaches from the
literature, under several conditions for both optimal transport finding and
linear mode connectivity. The effectiveness of our continual learning method
based on re-basin is also shown for several common benchmark datasets,
providing experimental results that are competitive with state-of-art results
from the literature.
Related papers
- Self-Improvement for Neural Combinatorial Optimization: Sample without Replacement, but Improvement [1.1510009152620668]
Current methods for constructive neural optimization usually train a policy using behavior cloning from expert solutions or policy gradient methods from reinforcement learning.
We bridge the two by sampling multiple solutions for random instances using the current model in each epoch and then selecting the best solution as an expert trajectory for supervised imitation learning.
We evaluate our approach on the Traveling Salesman Problem and the Capacitated Vehicle Routing Problem. The models trained with our method achieve comparable performance and generalization to those trained with expert data.
arXiv Detail & Related papers (2024-03-22T13:09:10Z) - Optimizing Solution-Samplers for Combinatorial Problems: The Landscape
of Policy-Gradient Methods [52.0617030129699]
We introduce a novel theoretical framework for analyzing the effectiveness of DeepMatching Networks and Reinforcement Learning methods.
Our main contribution holds for a broad class of problems including Max-and Min-Cut, Max-$k$-Bipartite-Bi, Maximum-Weight-Bipartite-Bi, and Traveling Salesman Problem.
As a byproduct of our analysis we introduce a novel regularization process over vanilla descent and provide theoretical and experimental evidence that it helps address vanishing-gradient issues and escape bad stationary points.
arXiv Detail & Related papers (2023-10-08T23:39:38Z) - Interactive Graph Convolutional Filtering [79.34979767405979]
Interactive Recommender Systems (IRS) have been increasingly used in various domains, including personalized article recommendation, social media, and online advertising.
These problems are exacerbated by the cold start problem and data sparsity problem.
Existing Multi-Armed Bandit methods, despite their carefully designed exploration strategies, often struggle to provide satisfactory results in the early stages.
Our proposed method extends interactive collaborative filtering into the graph model to enhance the performance of collaborative filtering between users and items.
arXiv Detail & Related papers (2023-09-04T09:02:31Z) - A Game of Bundle Adjustment -- Learning Efficient Convergence [11.19540223578237]
We show how to reduce the number of iterations required to reach the bundle adjustment's convergence.
We show that this reduction benefits the classic approach and can be integrated with other bundle adjustment acceleration methods.
arXiv Detail & Related papers (2023-08-25T09:44:45Z) - Resource-Adaptive Newton's Method for Distributed Learning [16.588456212160928]
This paper introduces a novel and efficient algorithm called RANL, which overcomes the limitations of Newton's method.
Unlike traditional first-order methods, RANL exhibits remarkable independence from the condition number of the problem.
arXiv Detail & Related papers (2023-08-20T04:01:30Z) - Linearization Algorithms for Fully Composite Optimization [61.20539085730636]
This paper studies first-order algorithms for solving fully composite optimization problems convex compact sets.
We leverage the structure of the objective by handling differentiable and non-differentiable separately, linearizing only the smooth parts.
arXiv Detail & Related papers (2023-02-24T18:41:48Z) - Backpropagation of Unrolled Solvers with Folded Optimization [55.04219793298687]
The integration of constrained optimization models as components in deep networks has led to promising advances on many specialized learning tasks.
One typical strategy is algorithm unrolling, which relies on automatic differentiation through the operations of an iterative solver.
This paper provides theoretical insights into the backward pass of unrolled optimization, leading to a system for generating efficiently solvable analytical models of backpropagation.
arXiv Detail & Related papers (2023-01-28T01:50:42Z) - Batch Active Learning from the Perspective of Sparse Approximation [12.51958241746014]
Active learning enables efficient model training by leveraging interactions between machine learning agents and human annotators.
We study and propose a novel framework that formulates batch active learning from the sparse approximation's perspective.
Our active learning method aims to find an informative subset from the unlabeled data pool such that the corresponding training loss function approximates its full data pool counterpart.
arXiv Detail & Related papers (2022-11-01T03:20:28Z) - Sample-Efficient, Exploration-Based Policy Optimisation for Routing
Problems [2.6782615615913348]
This paper presents a new reinforcement learning approach that is based on entropy.
In addition, we design an off-policy-based reinforcement learning technique that maximises the expected return.
We show that our model can generalise to various route problems.
arXiv Detail & Related papers (2022-05-31T09:51:48Z) - Cogradient Descent for Bilinear Optimization [124.45816011848096]
We introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem.
We solve one variable by considering its coupling relationship with the other, leading to a synchronous gradient descent.
Our algorithm is applied to solve problems with one variable under the sparsity constraint.
arXiv Detail & Related papers (2020-06-16T13:41:54Z) - Learning with Differentiable Perturbed Optimizers [54.351317101356614]
We propose a systematic method to transform operations into operations that are differentiable and never locally constant.
Our approach relies on perturbeds, and can be used readily together with existing solvers.
We show how this framework can be connected to a family of losses developed in structured prediction, and give theoretical guarantees for their use in learning tasks.
arXiv Detail & Related papers (2020-02-20T11:11:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.