Multilevel Objective-Function-Free Optimization with an Application to
Neural Networks Training
- URL: http://arxiv.org/abs/2302.07049v1
- Date: Tue, 14 Feb 2023 14:03:22 GMT
- Title: Multilevel Objective-Function-Free Optimization with an Application to
Neural Networks Training
- Authors: S. Gratton, A. Kopanicakova, Ph. L. Toint
- Abstract summary: A class of multi-level algorithms for unconstrained nonlinear optimization is presented.
The choice of avoiding the evaluation of the objective function is intended to make the algorithms less sensitive to noise.
The evaluation complexity of these algorithms is analyzed and their behaviour in the presence of noise is illustrated.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A class of multi-level algorithms for unconstrained nonlinear optimization is
presented which does not require the evaluation of the objective function. The
class contains the momentum-less AdaGrad method as a particular (single-level)
instance. The choice of avoiding the evaluation of the objective function is
intended to make the algorithms of the class less sensitive to noise, while the
multi-level feature aims at reducing their computational cost. The evaluation
complexity of these algorithms is analyzed and their behaviour in the presence
of noise is then illustrated in the context of training deep neural networks
for supervised learning applications.
Related papers
- Training Artificial Neural Networks by Coordinate Search Algorithm [0.20971479389679332]
We propose an efficient version of the gradient-free Coordinate Search (CS) algorithm for training neural networks.
The proposed algorithm can be used with non-differentiable activation functions and tailored to multi-objective/multi-loss problems.
Finding the optimal values for weights of ANNs is a large-scale optimization problem.
arXiv Detail & Related papers (2024-02-20T01:47:25Z) - The limitation of neural nets for approximation and optimization [0.0]
We are interested in assessing the use of neural networks as surrogate models to approximate and minimize objective functions in optimization problems.
Our study begins by determining the best activation function for approximating the objective functions of popular nonlinear optimization test problems.
arXiv Detail & Related papers (2023-11-21T00:21:15Z) - Score-Based Methods for Discrete Optimization in Deep Learning [30.446056972242616]
We investigate a score-based approximation framework to solve such problems.
We experimentally demonstrate, in adversarial set classification tasks, that our method achieves a superior trade-off in terms of speed and solution quality compared to methods.
arXiv Detail & Related papers (2023-10-15T17:14:17Z) - Efficient Model-Free Exploration in Low-Rank MDPs [76.87340323826945]
Low-Rank Markov Decision Processes offer a simple, yet expressive framework for RL with function approximation.
Existing algorithms are either (1) computationally intractable, or (2) reliant upon restrictive statistical assumptions.
We propose the first provably sample-efficient algorithm for exploration in Low-Rank MDPs.
arXiv Detail & Related papers (2023-07-08T15:41:48Z) - Representation Learning with Multi-Step Inverse Kinematics: An Efficient
and Optimal Approach to Rich-Observation RL [106.82295532402335]
Existing reinforcement learning algorithms suffer from computational intractability, strong statistical assumptions, and suboptimal sample complexity.
We provide the first computationally efficient algorithm that attains rate-optimal sample complexity with respect to the desired accuracy level.
Our algorithm, MusIK, combines systematic exploration with representation learning based on multi-step inverse kinematics.
arXiv Detail & Related papers (2023-04-12T14:51:47Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Self-adaptive algorithms for quasiconvex programming and applications to
machine learning [0.0]
We provide a self-adaptive step-size strategy that does not include convex line-search techniques and a generic approach under mild assumptions.
The proposed method is verified by preliminary results from some computational examples.
To demonstrate the effectiveness of the proposed technique for large-scale problems, we apply it to some experiments on machine learning.
arXiv Detail & Related papers (2022-12-13T05:30:29Z) - Improved Algorithms for Neural Active Learning [74.89097665112621]
We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting.
We introduce two regret metrics by minimizing the population loss that are more suitable in active learning than the one used in state-of-the-art (SOTA) related work.
arXiv Detail & Related papers (2022-10-02T05:03:38Z) - Towards Diverse Evaluation of Class Incremental Learning: A Representation Learning Perspective [67.45111837188685]
Class incremental learning (CIL) algorithms aim to continually learn new object classes from incrementally arriving data.
We experimentally analyze neural network models trained by CIL algorithms using various evaluation protocols in representation learning.
arXiv Detail & Related papers (2022-06-16T11:44:11Z) - Efficient Methods for Structured Nonconvex-Nonconcave Min-Max
Optimization [98.0595480384208]
We propose a generalization extraient spaces which converges to a stationary point.
The algorithm applies not only to general $p$-normed spaces, but also to general $p$-dimensional vector spaces.
arXiv Detail & Related papers (2020-10-31T21:35:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.