Explicit and Implicit Graduated Optimization in Deep Neural Networks
- URL: http://arxiv.org/abs/2412.11501v1
- Date: Mon, 16 Dec 2024 07:23:22 GMT
- Title: Explicit and Implicit Graduated Optimization in Deep Neural Networks
- Authors: Naoki Sato, Hideaki Iiduka,
- Abstract summary: This paper experimentally evaluates the performance of an explicit graduated optimization algorithm with an optimal noise scheduling.
In addition, it demonstrates its effectiveness through experiments on image classification tasks with ResNet architectures.
- Score: 0.6906005491572401
- License:
- Abstract: Graduated optimization is a global optimization technique that is used to minimize a multimodal nonconvex function by smoothing the objective function with noise and gradually refining the solution. This paper experimentally evaluates the performance of the explicit graduated optimization algorithm with an optimal noise scheduling derived from a previous study and discusses its limitations. It uses traditional benchmark functions and empirical loss functions for modern neural network architectures for evaluating. In addition, this paper extends the implicit graduated optimization algorithm, which is based on the fact that stochastic noise in the optimization process of SGD implicitly smooths the objective function, to SGD with momentum, analyzes its convergence, and demonstrates its effectiveness through experiments on image classification tasks with ResNet architectures.
Related papers
- Discovering Preference Optimization Algorithms with and for Large Language Models [50.843710797024805]
offline preference optimization is a key method for enhancing and controlling the quality of Large Language Model (LLM) outputs.
We perform objective discovery to automatically discover new state-of-the-art preference optimization algorithms without (expert) human intervention.
Experiments demonstrate the state-of-the-art performance of DiscoPOP, a novel algorithm that adaptively blends logistic and exponential losses.
arXiv Detail & Related papers (2024-06-12T16:58:41Z) - Dynamic Anisotropic Smoothing for Noisy Derivative-Free Optimization [0.0]
We propose a novel algorithm that extends the methods of ball smoothing and Gaussian smoothing for noisy derivative-free optimization.
The algorithm dynamically adapts the shape of the smoothing kernel to approximate the Hessian of the objective function around a local optimum.
arXiv Detail & Related papers (2024-05-02T21:04:20Z) - Beyond Single-Model Views for Deep Learning: Optimization versus
Generalizability of Stochastic Optimization Algorithms [13.134564730161983]
This paper adopts a novel approach to deep learning optimization, focusing on gradient descent (SGD) and its variants.
We show that SGD and its variants demonstrate performance on par with flat-minimas like SAM, albeit with half the gradient evaluations.
Our study uncovers several key findings regarding the relationship between training loss and hold-out accuracy, as well as the comparable performance of SGD and noise-enabled variants.
arXiv Detail & Related papers (2024-03-01T14:55:22Z) - Operator SVD with Neural Networks via Nested Low-Rank Approximation [19.562492156734653]
This paper proposes a new optimization framework based on the low-rank approximation characterization of a truncated singular value decomposition.
New techniques called emphnesting for learning the top-$L$ singular values and singular functions in the correct order.
We demonstrate the effectiveness of the proposed framework for use cases in computational physics and machine learning.
arXiv Detail & Related papers (2024-02-06T03:06:06Z) - Analyzing and Enhancing the Backward-Pass Convergence of Unrolled
Optimization [50.38518771642365]
The integration of constrained optimization models as components in deep networks has led to promising advances on many specialized learning tasks.
A central challenge in this setting is backpropagation through the solution of an optimization problem, which often lacks a closed form.
This paper provides theoretical insights into the backward pass of unrolled optimization, showing that it is equivalent to the solution of a linear system by a particular iterative method.
A system called Folded Optimization is proposed to construct more efficient backpropagation rules from unrolled solver implementations.
arXiv Detail & Related papers (2023-12-28T23:15:18Z) - Using Stochastic Gradient Descent to Smooth Nonconvex Functions: Analysis of Implicit Graduated Optimization [0.6906005491572401]
We show that noise in batch descent (SGD) has the effect of smoothing objective function.
We show that there is an interesting relationship between the degree of smoothing by SGDs noise, and the well-studied sharpness' indicator.
arXiv Detail & Related papers (2023-11-15T07:27:40Z) - Federated Conditional Stochastic Optimization [110.513884892319]
Conditional optimization has found in a wide range of machine learning tasks, such as in-variant learning tasks, AUPRC, andAML.
This paper proposes algorithms for distributed federated learning.
arXiv Detail & Related papers (2023-10-04T01:47:37Z) - Backpropagation of Unrolled Solvers with Folded Optimization [55.04219793298687]
The integration of constrained optimization models as components in deep networks has led to promising advances on many specialized learning tasks.
One typical strategy is algorithm unrolling, which relies on automatic differentiation through the operations of an iterative solver.
This paper provides theoretical insights into the backward pass of unrolled optimization, leading to a system for generating efficiently solvable analytical models of backpropagation.
arXiv Detail & Related papers (2023-01-28T01:50:42Z) - An Empirical Evaluation of Zeroth-Order Optimization Methods on
AI-driven Molecule Optimization [78.36413169647408]
We study the effectiveness of various ZO optimization methods for optimizing molecular objectives.
We show the advantages of ZO sign-based gradient descent (ZO-signGD)
We demonstrate the potential effectiveness of ZO optimization methods on widely used benchmark tasks from the Guacamol suite.
arXiv Detail & Related papers (2022-10-27T01:58:10Z) - Improved Algorithms for Neural Active Learning [74.89097665112621]
We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting.
We introduce two regret metrics by minimizing the population loss that are more suitable in active learning than the one used in state-of-the-art (SOTA) related work.
arXiv Detail & Related papers (2022-10-02T05:03:38Z) - A Dynamical View on Optimization Algorithms of Overparameterized Neural
Networks [23.038631072178735]
We consider a broad class of optimization algorithms that are commonly used in practice.
As a consequence, we can leverage the convergence behavior of neural networks.
We believe our approach can also be extended to other optimization algorithms and network theory.
arXiv Detail & Related papers (2020-10-25T17:10:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.