Related papers: Soft Merging: A Flexible and Robust Soft Model Merging Approach for Enhanced Neural Network Performance

Soft Merging: A Flexible and Robust Soft Model Merging Approach for Enhanced Neural Network Performance

URL: http://arxiv.org/abs/2309.12259v1
Date: Thu, 21 Sep 2023 17:07:31 GMT
Title: Soft Merging: A Flexible and Robust Soft Model Merging Approach for Enhanced Neural Network Performance
Authors: Hao Chen, Yusen Wu, Phuong Nguyen, Chao Liu, Yelena Yesha
Abstract summary: Gradient (SGD) is often limited to converging local optima to improve model performance. em soft merging method minimizes the obtained local optima models in undesirable results. Experiments underscore the effectiveness of the merged networks.
Score: 6.599368083393398
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Stochastic Gradient Descent (SGD), a widely used optimization algorithm in deep learning, is often limited to converging to local optima due to the non-convex nature of the problem. Leveraging these local optima to improve model performance remains a challenging task. Given the inherent complexity of neural networks, the simple arithmetic averaging of the obtained local optima models in undesirable results. This paper proposes a {\em soft merging} method that facilitates rapid merging of multiple models, simplifies the merging of specific parts of neural networks, and enhances robustness against malicious models with extreme values. This is achieved by learning gate parameters through a surrogate of the $l_0$ norm using hard concrete distribution without modifying the model weights of the given local optima models. This merging process not only enhances the model performance by converging to a better local optimum, but also minimizes computational costs, offering an efficient and explicit learning process integrated with stochastic gradient descent. Thorough experiments underscore the effectiveness and superior performance of the merged neural networks.

Related papers

NAN: A Training-Free Solution to Coefficient Estimation in Model Merging [61.36020737229637]
We show that the optimal merging weights should scale with the amount of task-specific information encoded in each model.<n>We propose NAN, a simple yet effective method that estimates model merging coefficients via the inverse of parameter norm.<n>NAN is training-free, plug-and-play, and applicable to a wide range of merging strategies.
arXiv Detail & Related papers (2025-05-22T02:46:08Z)
ALWNN Empowered Automatic Modulation Classification: Conquering Complexity and Scarce Sample Conditions [24.59462798452397]
This paper proposes an automatic modulation classification model based on the Adaptive Lightweight Wavelet Neural Network (ALWNN) and the few-shot framework (MALWNN) The ALWNN model, by integrating the adaptive wavelet neural network and depth separable convolution, reduces the number of model parameters and computational complexity. Experiments with MALWNN show its superior performance in few-shot learning scenarios compared to other algorithms.
arXiv Detail & Related papers (2025-03-24T06:14:33Z)
Regularized second-order optimization of tensor-network Born machines [2.8834278113855896]
Born machines (TNBMs) are quantum-inspired generative models for learning data distributions. We present an improved second-order optimization technique for TNBM training, which significantly enhances convergence rates and the quality of the optimized model.
arXiv Detail & Related papers (2025-01-30T19:00:04Z)
MARS: Unleashing the Power of Variance Reduction for Training Large Models [56.47014540413659]
Large gradient algorithms like Adam, Adam, and their variants have been central to the development of this type of training. We propose a framework that reconciles preconditioned gradient optimization methods with variance reduction via a scaled momentum technique.
arXiv Detail & Related papers (2024-11-15T18:57:39Z)
The Convex Landscape of Neural Networks: Characterizing Global Optima and Stationary Points via Lasso Models [75.33431791218302]
Deep Neural Network Network (DNN) models are used for programming purposes. In this paper we examine the use of convex neural recovery models. We show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program. We also show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
arXiv Detail & Related papers (2023-12-19T23:04:56Z)
Explicit Foundation Model Optimization with Self-Attentive Feed-Forward Neural Units [4.807347156077897]
Iterative approximation methods using backpropagation enable the optimization of neural networks, but they remain computationally expensive when used at scale. This paper presents an efficient alternative for optimizing neural networks that reduces the costs of scaling neural networks and provides high-efficiency optimizations for low-resource applications.
arXiv Detail & Related papers (2023-11-13T17:55:07Z)
Distributed Pruning Towards Tiny Neural Networks in Federated Learning [12.63559789381064]
FedTiny is a distributed pruning framework for federated learning. It generates specialized tiny models for memory- and computing-constrained devices. It achieves an accuracy improvement of 2.61% while significantly reducing the computational cost by 95.91%.
arXiv Detail & Related papers (2022-12-05T01:58:45Z)
Joint inference and input optimization in equilibrium networks [68.63726855991052]
deep equilibrium model is a class of models that foregoes traditional network depth and instead computes the output of a network by finding the fixed point of a single nonlinear layer. We show that there is a natural synergy between these two settings. We demonstrate this strategy on various tasks such as training generative models while optimizing over latent codes, training models for inverse problems like denoising and inpainting, adversarial training and gradient based meta-learning.
arXiv Detail & Related papers (2021-11-25T19:59:33Z)
Revisit Geophysical Imaging in A New View of Physics-informed Generative Adversarial Learning [2.12121796606941]
Full waveform inversion produces high-resolution subsurface models. FWI with least-squares function suffers from many drawbacks such as the local-minima problem. Recent works relying on partial differential equations and neural networks show promising performance for two-dimensional FWI. We propose an unsupervised learning paradigm that integrates wave equation with a discriminate network to accurately estimate the physically consistent models.
arXiv Detail & Related papers (2021-09-23T15:54:40Z)
Non-Gradient Manifold Neural Network [79.44066256794187]
Deep neural network (DNN) generally takes thousands of iterations to optimize via gradient descent. We propose a novel manifold neural network based on non-gradient optimization.
arXiv Detail & Related papers (2021-06-15T06:39:13Z)
Decentralized Statistical Inference with Unrolled Graph Neural Networks [26.025935320024665]
We propose a learning-based framework, which unrolls decentralized optimization algorithms into graph neural networks (GNNs) By minimizing the recovery error via end-to-end training, this learning-based framework resolves the model mismatch issue. Our convergence analysis reveals that the learned model parameters may accelerate the convergence and reduce the recovery error to a large extent.
arXiv Detail & Related papers (2021-04-04T07:52:34Z)
Improving Gradient Flow with Unrolled Highway Expectation Maximization [0.9539495585692008]
We propose Highway Expectation Maximization Networks (HEMNet), which is comprised of unrolled iterations of the generalized EM (GEM) algorithm. HEMNet features scaled skip connections, or highways, along the depths of the unrolled architecture, resulting in improved gradient flow during backpropagation. We achieve significant improvement on several semantic segmentation benchmarks and empirically show that HEMNet effectively alleviates gradient decay.
arXiv Detail & Related papers (2020-12-09T09:11:45Z)
Optimization-driven Machine Learning for Intelligent Reflecting Surfaces Assisted Wireless Networks [82.33619654835348]
Intelligent surface (IRS) has been employed to reshape the wireless channels by controlling individual scattering elements' phase shifts. Due to the large size of scattering elements, the passive beamforming is typically challenged by the high computational complexity. In this article, we focus on machine learning (ML) approaches for performance in IRS-assisted wireless networks.
arXiv Detail & Related papers (2020-08-29T08:39:43Z)
Steepest Descent Neural Architecture Optimization: Escaping Local Optimum with Signed Neural Splitting [60.97465664419395]
We develop a significant and surprising extension of the splitting descent framework that addresses the local optimality issue. By simply allowing both positive and negative weights during splitting, we can eliminate the appearance of splitting stability in S2D. We verify our method on various challenging benchmarks such as CIFAR-100, ImageNet and ModelNet40, on which we outperform S2D and other advanced methods on learning accurate and energy-efficient neural networks.
arXiv Detail & Related papers (2020-03-23T17:09:27Z)
Model Fusion via Optimal Transport [64.13185244219353]
We present a layer-wise model fusion algorithm for neural networks. We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
arXiv Detail & Related papers (2019-10-12T22:07:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.