Equivariant Deep Weight Space Alignment
- URL: http://arxiv.org/abs/2310.13397v4
- Date: Sat, 09 Nov 2024 22:00:16 GMT
- Title: Equivariant Deep Weight Space Alignment
- Authors: Aviv Navon, Aviv Shamsian, Ethan Fetaya, Gal Chechik, Nadav Dym, Haggai Maron,
- Abstract summary: We propose a novel framework aimed at learning to solve the weight alignment problem.
We first prove that weight alignment adheres to two fundamental symmetries and then, propose a deep architecture that respects these symmetries.
- Score: 54.65847470115314
- License:
- Abstract: Permutation symmetries of deep networks make basic operations like model merging and similarity estimation challenging. In many cases, aligning the weights of the networks, i.e., finding optimal permutations between their weights, is necessary. Unfortunately, weight alignment is an NP-hard problem. Prior research has mainly focused on solving relaxed versions of the alignment problem, leading to either time-consuming methods or sub-optimal solutions. To accelerate the alignment process and improve its quality, we propose a novel framework aimed at learning to solve the weight alignment problem, which we name Deep-Align. To that end, we first prove that weight alignment adheres to two fundamental symmetries and then, propose a deep architecture that respects these symmetries. Notably, our framework does not require any labeled data. We provide a theoretical analysis of our approach and evaluate Deep-Align on several types of network architectures and learning setups. Our experimental results indicate that a feed-forward pass with Deep-Align produces better or equivalent alignments compared to those produced by current optimization algorithms. Additionally, our alignments can be used as an effective initialization for other methods, leading to improved solutions with a significant speedup in convergence.
Related papers
- SPARE: Symmetrized Point-to-Plane Distance for Robust Non-Rigid Registration [76.40993825836222]
We propose SPARE, a novel formulation that utilizes a symmetrized point-to-plane distance for robust non-rigid registration.
The proposed method greatly improves the accuracy of non-rigid registration problems and maintains relatively high solution efficiency.
arXiv Detail & Related papers (2024-05-30T15:55:04Z) - Optimizing Solution-Samplers for Combinatorial Problems: The Landscape
of Policy-Gradient Methods [52.0617030129699]
We introduce a novel theoretical framework for analyzing the effectiveness of DeepMatching Networks and Reinforcement Learning methods.
Our main contribution holds for a broad class of problems including Max-and Min-Cut, Max-$k$-Bipartite-Bi, Maximum-Weight-Bipartite-Bi, and Traveling Salesman Problem.
As a byproduct of our analysis we introduce a novel regularization process over vanilla descent and provide theoretical and experimental evidence that it helps address vanishing-gradient issues and escape bad stationary points.
arXiv Detail & Related papers (2023-10-08T23:39:38Z) - PARL: A Unified Framework for Policy Alignment in Reinforcement Learning from Human Feedback [106.63518036538163]
We present a novel unified bilevel optimization-based framework, textsfPARL, formulated to address the recently highlighted critical issue of policy alignment in reinforcement learning.
Our framework addressed these concerns by explicitly parameterizing the distribution of the upper alignment objective (reward design) by the lower optimal variable.
Our empirical results substantiate that the proposed textsfPARL can address the alignment concerns in RL by showing significant improvements.
arXiv Detail & Related papers (2023-08-03T18:03:44Z) - Tile Networks: Learning Optimal Geometric Layout for Whole-page
Recommendation [14.951408879079272]
We show it is possible to solve configuration optimization problems for whole-page recommendation using reinforcement learning.
The proposed textitTile Networks is a neural architecture that optimize 2D geometric configurations by arranging items on proper positions.
arXiv Detail & Related papers (2023-03-03T02:18:55Z) - A Study of Scalarisation Techniques for Multi-Objective QUBO Solving [0.0]
Quantum and quantum-inspired optimisation algorithms have shown promising performance when applied to academic benchmarks as well as real-world problems.
However, QUBO solvers are single objective solvers. To make them more efficient at solving problems with multiple objectives, a decision on how to convert such multi-objective problems to single-objective problems need to be made.
arXiv Detail & Related papers (2022-10-20T14:54:37Z) - Joint inference and input optimization in equilibrium networks [68.63726855991052]
deep equilibrium model is a class of models that foregoes traditional network depth and instead computes the output of a network by finding the fixed point of a single nonlinear layer.
We show that there is a natural synergy between these two settings.
We demonstrate this strategy on various tasks such as training generative models while optimizing over latent codes, training models for inverse problems like denoising and inpainting, adversarial training and gradient based meta-learning.
arXiv Detail & Related papers (2021-11-25T19:59:33Z) - Lower Bounds and Optimal Algorithms for Smooth and Strongly Convex
Decentralized Optimization Over Time-Varying Networks [79.16773494166644]
We consider the task of minimizing the sum of smooth and strongly convex functions stored in a decentralized manner across the nodes of a communication network.
We design two optimal algorithms that attain these lower bounds.
We corroborate the theoretical efficiency of these algorithms by performing an experimental comparison with existing state-of-the-art methods.
arXiv Detail & Related papers (2021-06-08T15:54:44Z) - Towards Accelerating Training of Batch Normalization: A Manifold
Perspective [19.55158964644964]
Batch normalization (BN) has become a crucial component across diverse deep neural networks.
We propose a quotient manifold emphPSI manifold, in which all the equivalent weights of the network with BN are regarded as the same one element.
We show that our algorithms can consistently achieve better performances over various experimental settings.
arXiv Detail & Related papers (2021-01-08T08:53:07Z) - Primal-Dual Sequential Subspace Optimization for Saddle-point Problems [3.9582154141918964]
We introduce a new sequential subspace optimization method for large-scale saddle-point problems.
It solves auxiliary saddle-point problems in low-dimensional subspaces, spanned by directions derived from first-order information over the primal emphand dual variables.
Experimental results demonstrate significantly better convergence relative to popular first-order methods.
arXiv Detail & Related papers (2020-08-20T18:19:19Z) - FedSplit: An algorithmic framework for fast federated optimization [40.42352500741025]
We introduce FedSplit, a class of algorithms for solving distributed convex minimization with additive structure.
Our theory shows that these methods are provably robust to inexact computation of intermediate local quantities.
arXiv Detail & Related papers (2020-05-11T16:30:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.