RTDK-BO: High Dimensional Bayesian Optimization with Reinforced
Transformer Deep kernels
- URL: http://arxiv.org/abs/2310.03912v5
- Date: Wed, 8 Nov 2023 13:42:27 GMT
- Title: RTDK-BO: High Dimensional Bayesian Optimization with Reinforced
Transformer Deep kernels
- Authors: Alexander Shmakov, Avisek Naug, Vineet Gundecha, Sahand Ghorbanpour,
Ricardo Luna Gutierrez, Ashwin Ramesh Babu, Antonio Guillen and Soumyendu
Sarkar
- Abstract summary: We combine recent developments in Deep Kernel Learning (DKL) and attention-based Transformer models to improve the modeling powers of GP surrogates with meta-learning.
We propose a novel method for improving meta-learning BO surrogates by incorporating attention mechanisms into DKL.
We combine this Transformer Deep Kernel with a learned acquisition function trained with continuous Soft Actor-Critic Reinforcement Learning to aid in exploration.
- Score: 39.53062980223013
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Bayesian Optimization (BO), guided by Gaussian process (GP) surrogates, has
proven to be an invaluable technique for efficient, high-dimensional, black-box
optimization, a critical problem inherent to many applications such as
industrial design and scientific computing. Recent contributions have
introduced reinforcement learning (RL) to improve the optimization performance
on both single function optimization and \textit{few-shot} multi-objective
optimization. However, even few-shot techniques fail to exploit similarities
shared between closely related objectives. In this paper, we combine recent
developments in Deep Kernel Learning (DKL) and attention-based Transformer
models to improve the modeling powers of GP surrogates with meta-learning. We
propose a novel method for improving meta-learning BO surrogates by
incorporating attention mechanisms into DKL, empowering the surrogates to adapt
to contextual information gathered during the BO process. We combine this
Transformer Deep Kernel with a learned acquisition function trained with
continuous Soft Actor-Critic Reinforcement Learning to aid in exploration. This
Reinforced Transformer Deep Kernel (RTDK-BO) approach yields state-of-the-art
results in continuous high-dimensional optimization problems.
Related papers
- Edge-Efficient Deep Learning Models for Automatic Modulation Classification: A Performance Analysis [0.7428236410246183]
We investigate optimized convolutional neural networks (CNNs) developed for automatic modulation classification (AMC) of wireless signals.
We propose optimized models with the combinations of these techniques to fuse the complementary optimization benefits.
The experimental results show that the proposed individual and combined optimization techniques are highly effective for developing models with significantly less complexity.
arXiv Detail & Related papers (2024-04-11T06:08:23Z) - Efficient Bayesian Optimization with Deep Kernel Learning and
Transformer Pre-trained on Multiple Heterogeneous Datasets [9.510327380529892]
We propose a simple approach to pre-train a surrogate, which is a Gaussian process (GP) with a kernel defined on deep features learned from a Transformer-based encoder.
Experiments on both synthetic and real benchmark problems demonstrate the effectiveness of our proposed pre-training and transfer BO strategy.
arXiv Detail & Related papers (2023-08-09T01:56:10Z) - Multiplicative update rules for accelerating deep learning training and
increasing robustness [69.90473612073767]
We propose an optimization framework that fits to a wide range of machine learning algorithms and enables one to apply alternative update rules.
We claim that the proposed framework accelerates training, while leading to more robust models in contrast to traditionally used additive update rule.
arXiv Detail & Related papers (2023-07-14T06:44:43Z) - Understanding Optimization of Deep Learning via Jacobian Matrix and
Lipschitz Constant [18.592094066642364]
This article provides a comprehensive understanding of optimization in deep learning.
We focus on the challenges of gradient vanishing and gradient exploding, which normally lead to diminished model representational ability and training instability, respectively.
To help understand the current optimization methodologies, we categorize them into two classes: explicit optimization and implicit optimization.
arXiv Detail & Related papers (2023-06-15T17:59:27Z) - End-to-End Meta-Bayesian Optimisation with Transformer Neural Processes [52.818579746354665]
This paper proposes the first end-to-end differentiable meta-BO framework that generalises neural processes to learn acquisition functions via transformer architectures.
We enable this end-to-end framework with reinforcement learning (RL) to tackle the lack of labelled acquisition data.
arXiv Detail & Related papers (2023-05-25T10:58:46Z) - An Empirical Evaluation of Zeroth-Order Optimization Methods on
AI-driven Molecule Optimization [78.36413169647408]
We study the effectiveness of various ZO optimization methods for optimizing molecular objectives.
We show the advantages of ZO sign-based gradient descent (ZO-signGD)
We demonstrate the potential effectiveness of ZO optimization methods on widely used benchmark tasks from the Guacamol suite.
arXiv Detail & Related papers (2022-10-27T01:58:10Z) - Meta-Learning with Neural Tangent Kernels [58.06951624702086]
We propose the first meta-learning paradigm in the Reproducing Kernel Hilbert Space (RKHS) induced by the meta-model's Neural Tangent Kernel (NTK)
Within this paradigm, we introduce two meta-learning algorithms, which no longer need a sub-optimal iterative inner-loop adaptation as in the MAML framework.
We achieve this goal by 1) replacing the adaptation with a fast-adaptive regularizer in the RKHS; and 2) solving the adaptation analytically based on the NTK theory.
arXiv Detail & Related papers (2021-02-07T20:53:23Z) - Few-Shot Bayesian Optimization with Deep Kernel Surrogates [7.208515071018781]
We propose a few-shot learning problem in which we train a shared deep surrogate model to adapt to the response function of a new task.
We propose the use of a deep kernel network for a Gaussian process surrogate that is meta-learned in an end-to-end fashion.
As a result, the novel few-shot optimization of our deep kernel surrogate leads to new state-of-the-art results at HPO.
arXiv Detail & Related papers (2021-01-19T15:00:39Z) - Optimization-Inspired Learning with Architecture Augmentations and
Control Mechanisms for Low-Level Vision [74.9260745577362]
This paper proposes a unified optimization-inspired learning framework to aggregate Generative, Discriminative, and Corrective (GDC) principles.
We construct three propagative modules to effectively solve the optimization models with flexible combinations.
Experiments across varied low-level vision tasks validate the efficacy and adaptability of GDC.
arXiv Detail & Related papers (2020-12-10T03:24:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.