Fugu-MT 論文翻訳(概要): A Gradient-based Bilevel Optimization Approach for Tuning Hyperparameters in Machine Learning

論文の概要: A Gradient-based Bilevel Optimization Approach for Tuning Hyperparameters in Machine Learning

arxiv url: http://arxiv.org/abs/2007.11022v1
Date: Tue, 21 Jul 2020 18:15:08 GMT
ステータス: 翻訳完了
システム内更新日: 2022-11-08 04:11:18.288379
Title: A Gradient-based Bilevel Optimization Approach for Tuning Hyperparameters in Machine Learning
Title（参考訳）: 機械学習におけるハイパーパラメータチューニングのための勾配に基づく2レベル最適化手法
Authors: Ankur Sinha, Tanmay Khandait, Raja Mohanty
Abstract要約: 本稿では,ハイパーパラメータ最適化問題の解法として,二段階解法を提案する。提案手法は汎用的で,任意の種類の機械学習アルゴリズムに容易に適用可能である。提案アルゴリズムの背景にある理論を議論し、2つのデータセットについて広範な計算研究を行う。
参考スコア（独自算出の注目度）: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Hyperparameter tuning is an active area of research in machine learning, where the aim is to identify the optimal hyperparameters that provide the best performance on the validation set. Hyperparameter tuning is often achieved using naive techniques, such as random search and grid search. However, most of these methods seldom lead to an optimal set of hyperparameters and often get very expensive. In this paper, we propose a bilevel solution method for solving the hyperparameter optimization problem that does not suffer from the drawbacks of the earlier studies. The proposed method is general and can be easily applied to any class of machine learning algorithms. The idea is based on the approximation of the lower level optimal value function mapping, which is an important mapping in bilevel optimization and helps in reducing the bilevel problem to a single level constrained optimization task. The single-level constrained optimization problem is solved using the augmented Lagrangian method. We discuss the theory behind the proposed algorithm and perform extensive computational study on two datasets that confirm the efficiency of the proposed method. We perform a comparative study against grid search, random search and Bayesian optimization techniques that shows that the proposed algorithm is multiple times faster on problems with one or two hyperparameters. The computational gain is expected to be significantly higher as the number of hyperparameters increase. Corresponding to a given hyperparameter most of the techniques in the literature often assume a unique optimal parameter set that minimizes loss on the training set. Such an assumption is often violated by deep learning architectures and the proposed method does not require any such assumption.
Abstract（参考訳）: ハイパーパラメータチューニングは機械学習における活発な研究領域であり、検証セット上で最高のパフォーマンスを提供する最適なハイパーパラメータを特定することを目的としている。ハイパーパラメータチューニングは、ランダムサーチやグリッドサーチのようなナイーブな手法で達成されることが多い。しかし、これらの手法のほとんどが最適なハイパーパラメータのセットにつながることは滅多になく、しばしば非常に高価になる。本稿では,先行研究の欠点を伴わないハイパーパラメータ最適化問題を解くための2レベル解法を提案する。提案手法は汎用的で,任意の種類の機械学習アルゴリズムに容易に適用可能である。この考え方は、双レベル最適化において重要なマッピングであり、双レベル問題を単一レベルの制約付き最適化タスクに還元するのに役立つ低レベル最適値関数写像の近似に基づいている。拡張ラグランジアン法を用いて一階制約最適化問題を解く。提案アルゴリズムの背後にある理論を議論し,提案手法の効率性を確認する2つのデータセットについて広範な計算研究を行う。我々は,格子探索,ランダム探索,ベイズ最適化手法の比較研究を行い,提案アルゴリズムが1つまたは2つのハイパーパラメータの問題に対して複数倍高速であることを示す。ハイパーパラメータ数の増加に伴い、計算利得は大幅に増加することが期待されている。与えられたハイパーパラメータに対応して、文献のほとんどのテクニックは、トレーニングセットの損失を最小限に抑えるユニークな最適パラメータセットを仮定する。このような仮定はしばしばディープラーニングアーキテクチャによって破られ、提案手法はそのような仮定を必要としない。

論文の概要: A Gradient-based Bilevel Optimization Approach for Tuning Hyperparameters in Machine Learning

関連論文リスト