Related papers: NOVAS: Non-convex Optimization via Adaptive Stochastic Search for End-to-End Learning and Control

NOVAS: Non-convex Optimization via Adaptive Stochastic Search for End-to-End Learning and Control

URL: http://arxiv.org/abs/2006.11992v3
Date: Thu, 1 Apr 2021 19:43:46 GMT
Title: NOVAS: Non-convex Optimization via Adaptive Stochastic Search for End-to-End Learning and Control
Authors: Ioannis Exarchos and Marcus A. Pereira and Ziyi Wang and Evangelos A. Theodorou
Abstract summary: We propose the use of adaptive search as a building block for general, non- neural optimization operations. We benchmark it against two existing alternatives on a synthetic energy-based structured task, and showcase its use in optimal control applications.
Score: 22.120942106939122
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this work we propose the use of adaptive stochastic search as a building block for general, non-convex optimization operations within deep neural network architectures. Specifically, for an objective function located at some layer in the network and parameterized by some network parameters, we employ adaptive stochastic search to perform optimization over its output. This operation is differentiable and does not obstruct the passing of gradients during backpropagation, thus enabling us to incorporate it as a component in end-to-end learning. We study the proposed optimization module's properties and benchmark it against two existing alternatives on a synthetic energy-based structured prediction task, and further showcase its use in stochastic optimal control applications.

Related papers

A Novel Unified Parametric Assumption for Nonconvex Optimization [53.943470475510196]
Non optimization is central to machine learning, but the general framework non convexity enables weak convergence guarantees too pessimistic compared to the other hand. We introduce a novel unified assumption in non convex algorithms.
arXiv Detail & Related papers (2025-02-17T21:25:31Z)
Indirect Query Bayesian Optimization with Integrated Feedback [17.66813850517961]
We develop a new class of Bayesian optimization problems where integrated feedback is given via a conditional expectation of the unknown function $f$ to be optimized. The goal is to find the global optimum of $f$ by adaptively querying and observing in the space transformed by the conditional distribution. This is motivated by real-world applications where one cannot access direct feedback due to privacy, hardware or computational constraints.
arXiv Detail & Related papers (2024-12-18T07:20:33Z)
Explicit and Implicit Graduated Optimization in Deep Neural Networks [0.6906005491572401]
This paper experimentally evaluates the performance of an explicit graduated optimization algorithm with an optimal noise scheduling. In addition, it demonstrates its effectiveness through experiments on image classification tasks with ResNet architectures.
arXiv Detail & Related papers (2024-12-16T07:23:22Z)
Analyzing and Enhancing the Backward-Pass Convergence of Unrolled Optimization [50.38518771642365]
The integration of constrained optimization models as components in deep networks has led to promising advances on many specialized learning tasks. A central challenge in this setting is backpropagation through the solution of an optimization problem, which often lacks a closed form. This paper provides theoretical insights into the backward pass of unrolled optimization, showing that it is equivalent to the solution of a linear system by a particular iterative method. A system called Folded Optimization is proposed to construct more efficient backpropagation rules from unrolled solver implementations.
arXiv Detail & Related papers (2023-12-28T23:15:18Z)
Federated Conditional Stochastic Optimization [110.513884892319]
Conditional optimization has found in a wide range of machine learning tasks, such as in-variant learning tasks, AUPRC, andAML. This paper proposes algorithms for distributed federated learning.
arXiv Detail & Related papers (2023-10-04T01:47:37Z)
Bidirectional Looking with A Novel Double Exponential Moving Average to Adaptive and Non-adaptive Momentum Optimizers [109.52244418498974]
We propose a novel textscAdmeta (textbfADouble exponential textbfMov averagtextbfE textbfAdaptive and non-adaptive momentum) framework. We provide two implementations, textscAdmetaR and textscAdmetaS, the former based on RAdam and the latter based on SGDM.
arXiv Detail & Related papers (2023-07-02T18:16:06Z)
Backpropagation of Unrolled Solvers with Folded Optimization [55.04219793298687]
The integration of constrained optimization models as components in deep networks has led to promising advances on many specialized learning tasks. One typical strategy is algorithm unrolling, which relies on automatic differentiation through the operations of an iterative solver. This paper provides theoretical insights into the backward pass of unrolled optimization, leading to a system for generating efficiently solvable analytical models of backpropagation.
arXiv Detail & Related papers (2023-01-28T01:50:42Z)
Transformer-Based Learned Optimization [37.84626515073609]
We propose a new approach to learned optimization where we represent the computation's update step using a neural network. Our innovation is a new neural network architecture inspired by the classic BFGS algorithm. We demonstrate the advantages of our approach on a benchmark composed of objective functions traditionally used for the evaluation of optimization algorithms.
arXiv Detail & Related papers (2022-12-02T09:47:08Z)
Efficient Non-Parametric Optimizer Search for Diverse Tasks [93.64739408827604]
We present the first efficient scalable and general framework that can directly search on the tasks of interest. Inspired by the innate tree structure of the underlying math expressions, we re-arrange the spaces into a super-tree. We adopt an adaptation of the Monte Carlo method to tree search, equipped with rejection sampling and equivalent- form detection.
arXiv Detail & Related papers (2022-09-27T17:51:31Z)
Bayesian Optimization for auto-tuning GPU kernels [0.0]
Finding optimal parameter configurations for GPU kernels is a non-trivial exercise for large search spaces, even when automated. We introduce a novel contextual exploration factor as well as new acquisition functions with improved scalability, combined with an informed function selection mechanism.
arXiv Detail & Related papers (2021-11-26T11:26:26Z)
Additive Tree-Structured Conditional Parameter Spaces in Bayesian Optimization: A Novel Covariance Function and a Fast Implementation [34.89735938765757]
We generalize the additive assumption to tree-structured functions, showing improved sample-efficiency, wider applicability and greater flexibility. By incorporating the structure information of parameter spaces and the additive assumption in the BO loop, we develop a parallel algorithm to optimize the acquisition function. We demonstrate our method on an optimization benchmark function, on pruning pre-trained VGG16 and Res50 models as well as on searching activation functions of ResNet20.
arXiv Detail & Related papers (2020-10-06T16:08:58Z)
Adaptive pruning-based optimization of parameterized quantum circuits [62.997667081978825]
Variisy hybrid quantum-classical algorithms are powerful tools to maximize the use of Noisy Intermediate Scale Quantum devices. We propose a strategy for such ansatze used in variational quantum algorithms, which we call "Efficient Circuit Training" (PECT) Instead of optimizing all of the ansatz parameters at once, PECT launches a sequence of variational algorithms.
arXiv Detail & Related papers (2020-10-01T18:14:11Z)
Additive Tree-Structured Covariance Function for Conditional Parameter Spaces in Bayesian Optimization [34.89735938765757]
We generalize the additive assumption to tree-structured functions. By incorporating the structure information of parameter spaces and the additive assumption in the BO loop, we develop a parallel algorithm to optimize the acquisition function.
arXiv Detail & Related papers (2020-06-21T11:21:55Z)
Adaptive Stochastic Optimization [1.7945141391585486]
Adaptive optimization methods have the potential to offer significant computational savings when training large-scale systems. Modern approaches based on the gradient method are non-adaptive in the sense that their implementation employs prescribed parameter values that need to be tuned for each application.
arXiv Detail & Related papers (2020-01-18T16:30:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.