Block Layer Decomposition schemes for training Deep Neural Networks
- URL: http://arxiv.org/abs/2003.08123v1
- Date: Wed, 18 Mar 2020 09:53:40 GMT
- Title: Block Layer Decomposition schemes for training Deep Neural Networks
- Authors: Laura Palagi, Ruggiero Seccia
- Abstract summary: Deep Feedforward Networks (DFNNs) weights estimation relies on a very large non-Coordinate optimization problem that may have many local (no global) minimizers, saddle points and large plateaus.
As a consequence, optimization algorithms can be attracted toward local minimizers which can lead to bad solutions or can slow down the optimization process.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep Feedforward Neural Networks' (DFNNs) weights estimation relies on the
solution of a very large nonconvex optimization problem that may have many
local (no global) minimizers, saddle points and large plateaus. As a
consequence, optimization algorithms can be attracted toward local minimizers
which can lead to bad solutions or can slow down the optimization process.
Furthermore, the time needed to find good solutions to the training problem
depends on both the number of samples and the number of variables. In this
work, we show how Block Coordinate Descent (BCD) methods can be applied to
improve performance of state-of-the-art algorithms by avoiding bad stationary
points and flat regions. We first describe a batch BCD method ables to
effectively tackle the network's depth and then we further extend the algorithm
proposing a \textit{minibatch} BCD framework able to scale with respect to both
the number of variables and the number of samples by embedding a BCD approach
into a minibatch framework. By extensive numerical results on standard datasets
for several architecture networks, we show how the application of BCD methods
to the training phase of DFNNs permits to outperform standard batch and
minibatch algorithms leading to an improvement on both the training phase and
the generalization performance of the networks.
Related papers
- Optimization Proxies using Limited Labeled Data and Training Time -- A Semi-Supervised Bayesian Neural Network Approach [2.943640991628177]
Constrained optimization problems arise in various engineering system operations such as inventory management electric power grids.
This work introduces a learning scheme using Bayesian Networks (BNNs) to solve constrained optimization problems under limited data and restricted model times.
We show that the proposed learning method outperforms conventional BNN and deep neural network (DNN) architectures.
arXiv Detail & Related papers (2024-10-04T02:10:20Z) - A lifted Bregman strategy for training unfolded proximal neural network Gaussian denoisers [8.343594411714934]
Unfolded proximal neural networks (PNNs) form a family of methods that combines deep learning and proximal optimization approaches.
We propose a lifted training formulation based on Bregman distances for unfolded PNNs.
We assess the behaviour of the proposed training approach for PNNs through numerical simulations on image denoising.
arXiv Detail & Related papers (2024-08-16T13:41:34Z) - Training Artificial Neural Networks by Coordinate Search Algorithm [0.20971479389679332]
We propose an efficient version of the gradient-free Coordinate Search (CS) algorithm for training neural networks.
The proposed algorithm can be used with non-differentiable activation functions and tailored to multi-objective/multi-loss problems.
Finding the optimal values for weights of ANNs is a large-scale optimization problem.
arXiv Detail & Related papers (2024-02-20T01:47:25Z) - Optimization Guarantees of Unfolded ISTA and ADMM Networks With Smooth
Soft-Thresholding [57.71603937699949]
We study optimization guarantees, i.e., achieving near-zero training loss with the increase in the number of learning epochs.
We show that the threshold on the number of training samples increases with the increase in the network width.
arXiv Detail & Related papers (2023-09-12T13:03:47Z) - A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical
Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs)
MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z) - Let the Flows Tell: Solving Graph Combinatorial Optimization Problems
with GFlowNets [86.43523688236077]
Combinatorial optimization (CO) problems are often NP-hard and out of reach for exact algorithms.
GFlowNets have emerged as a powerful machinery to efficiently sample from composite unnormalized densities sequentially.
In this paper, we design Markov decision processes (MDPs) for different problems and propose to train conditional GFlowNets to sample from the solution space.
arXiv Detail & Related papers (2023-05-26T15:13:09Z) - The Cascaded Forward Algorithm for Neural Network Training [61.06444586991505]
We propose a new learning framework for neural networks, namely Cascaded Forward (CaFo) algorithm, which does not rely on BP optimization as that in FF.
Unlike FF, our framework directly outputs label distributions at each cascaded block, which does not require generation of additional negative samples.
In our framework each block can be trained independently, so it can be easily deployed into parallel acceleration systems.
arXiv Detail & Related papers (2023-03-17T02:01:11Z) - Deep learning via message passing algorithms based on belief propagation [2.931240348160871]
We present a family of BP-based message-passing algorithms with a reinforcement field that biases towards locally entropic distributions.
These algorithms are capable of training multi-layer neural networks with discrete weights and activations with performance comparable to SGD-inspired solutions.
arXiv Detail & Related papers (2021-10-27T16:52:26Z) - Lower Bounds and Optimal Algorithms for Smooth and Strongly Convex
Decentralized Optimization Over Time-Varying Networks [79.16773494166644]
We consider the task of minimizing the sum of smooth and strongly convex functions stored in a decentralized manner across the nodes of a communication network.
We design two optimal algorithms that attain these lower bounds.
We corroborate the theoretical efficiency of these algorithms by performing an experimental comparison with existing state-of-the-art methods.
arXiv Detail & Related papers (2021-06-08T15:54:44Z) - An Adaptive Memory Multi-Batch L-BFGS Algorithm for Neural Network
Training [0.951828574518325]
A limited memory version of the BFGS algorithm has been receiving increasing attention in recent years for large neural network training problems.
We propose a multi-batch L-BFGS algorithm, namely MB-AM, that gradually increases its trust in the curvature information.
arXiv Detail & Related papers (2020-12-14T11:40:41Z) - Second-Order Guarantees in Centralized, Federated and Decentralized
Nonconvex Optimization [64.26238893241322]
Simple algorithms have been shown to lead to good empirical results in many contexts.
Several works have pursued rigorous analytical justification for studying non optimization problems.
A key insight in these analyses is that perturbations play a critical role in allowing local descent algorithms.
arXiv Detail & Related papers (2020-03-31T16:54:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.