An alternative approach to train neural networks using monotone
variational inequality
- URL: http://arxiv.org/abs/2202.08876v4
- Date: Mon, 11 Mar 2024 18:38:23 GMT
- Title: An alternative approach to train neural networks using monotone
variational inequality
- Authors: Chen Xu, Xiuyuan Cheng, Yao Xie
- Abstract summary: We propose an alternative approach to neural network training using the monotone vector field.
Our approach can be used for more efficient fine-tuning of a pre-trained neural network.
- Score: 22.320632565424745
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose an alternative approach to neural network training using the
monotone vector field, an idea inspired by the seminal work of Juditsky and
Nemirovski [Juditsky & Nemirovsky, 2019] developed originally to solve
parameter estimation problems for generalized linear models (GLM) by reducing
the original non-convex problem to a convex problem of solving a monotone
variational inequality (VI). Our approach leads to computationally efficient
procedures that converge fast and offer guarantee in some special cases, such
as training a single-layer neural network or fine-tuning the last layer of the
pre-trained model. Our approach can be used for more efficient fine-tuning of a
pre-trained model while freezing the bottom layers, an essential step for
deploying many machine learning models such as large language models (LLM). We
demonstrate its applicability in training fully-connected (FC) neural networks,
graph neural networks (GNN), and convolutional neural networks (CNN) and show
the competitive or better performance of our approach compared to stochastic
gradient descent methods on both synthetic and real network data prediction
tasks regarding various performance metrics.
Related papers
- The Convex Landscape of Neural Networks: Characterizing Global Optima
and Stationary Points via Lasso Models [75.33431791218302]
Deep Neural Network Network (DNN) models are used for programming purposes.
In this paper we examine the use of convex neural recovery models.
We show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
We also show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
arXiv Detail & Related papers (2023-12-19T23:04:56Z) - Analyzing Populations of Neural Networks via Dynamical Model Embedding [10.455447557943463]
A core challenge in the interpretation of deep neural networks is identifying commonalities between the underlying algorithms implemented by distinct networks trained for the same task.
Motivated by this problem, we introduce DYNAMO, an algorithm that constructs low-dimensional manifold where each point corresponds to a neural network model, and two points are nearby if the corresponding neural networks enact similar high-level computational processes.
DYNAMO takes as input a collection of pre-trained neural networks and outputs a meta-model that emulates the dynamics of the hidden states as well as the outputs of any model in the collection.
arXiv Detail & Related papers (2023-02-27T19:00:05Z) - Optimization-Informed Neural Networks [0.6853165736531939]
We propose optimization-informed neural networks (OINN) to solve constrained nonlinear optimization problems.
In a nutshell, OINN transforms a CNLP into a neural network training problem.
The effectiveness of the proposed approach is demonstrated through a collection of classical problems.
arXiv Detail & Related papers (2022-10-05T09:28:55Z) - Neural Capacitance: A New Perspective of Neural Network Selection via
Edge Dynamics [85.31710759801705]
Current practice requires expensive computational costs in model training for performance prediction.
We propose a novel framework for neural network selection by analyzing the governing dynamics over synaptic connections (edges) during training.
Our framework is built on the fact that back-propagation during neural network training is equivalent to the dynamical evolution of synaptic connections.
arXiv Detail & Related papers (2022-01-11T20:53:15Z) - Neuron-based Pruning of Deep Neural Networks with Better Generalization
using Kronecker Factored Curvature Approximation [18.224344440110862]
The proposed algorithm directs the parameters of the compressed model toward a flatter solution by exploring the spectral radius of Hessian.
Our result shows that it improves the state-of-the-art results on neuron compression.
The method is able to achieve very small networks with small accuracy across different neural network models.
arXiv Detail & Related papers (2021-11-16T15:55:59Z) - Learning Neural Network Subspaces [74.44457651546728]
Recent observations have advanced our understanding of the neural network optimization landscape.
With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
arXiv Detail & Related papers (2021-02-20T23:26:58Z) - Certified Monotonic Neural Networks [15.537695725617576]
We propose to certify the monotonicity of the general piece-wise linear neural networks by solving a mixed integer linear programming problem.
Our approach does not require human-designed constraints on the weight space and also yields more accurate approximation.
arXiv Detail & Related papers (2020-11-20T04:58:13Z) - A Bayesian Perspective on Training Speed and Model Selection [51.15664724311443]
We show that a measure of a model's training speed can be used to estimate its marginal likelihood.
We verify our results in model selection tasks for linear models and for the infinite-width limit of deep neural networks.
Our results suggest a promising new direction towards explaining why neural networks trained with gradient descent are biased towards functions that generalize well.
arXiv Detail & Related papers (2020-10-27T17:56:14Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z) - Model Fusion via Optimal Transport [64.13185244219353]
We present a layer-wise model fusion algorithm for neural networks.
We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
arXiv Detail & Related papers (2019-10-12T22:07:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.