Fugu-MT 論文翻訳(概要): The Convex Landscape of Neural Networks: Characterizing Global Optima and Stationary Points via Lasso Models

論文の概要: The Convex Landscape of Neural Networks: Characterizing Global Optima and Stationary Points via Lasso Models

arxiv url: http://arxiv.org/abs/2312.12657v1
Date: Tue, 19 Dec 2023 23:04:56 GMT
ステータス: 翻訳完了
システム内更新日: 2023-12-21 17:25:20.980584
Title: The Convex Landscape of Neural Networks: Characterizing Global Optima and Stationary Points via Lasso Models
Title（参考訳）: ニューラルネットワークの凸景観:ラッソモデルによるグローバル最適点と静止点の特徴付け
Authors: Tolga Ergen, Mert Pilanci
Abstract要約: ディープニューラルネットワーク(DNN)モデルは、プログラミング目的に使用される。本稿では,凸型神経回復モデルについて検討する。定常的非次元目的物はすべて,グローバルサブサンプリング型凸解法プログラムとして特徴付けられることを示す。また, 静止非次元目的物はすべて, グローバルサブサンプリング型凸解法プログラムとして特徴付けられることを示す。
参考スコア（独自算出の注目度）: 75.33431791218302
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Due to the non-convex nature of training Deep Neural Network (DNN) models, their effectiveness relies on the use of non-convex optimization heuristics. Traditional methods for training DNNs often require costly empirical methods to produce successful models and do not have a clear theoretical foundation. In this study, we examine the use of convex optimization theory and sparse recovery models to refine the training process of neural networks and provide a better interpretation of their optimal weights. We focus on training two-layer neural networks with piecewise linear activations and demonstrate that they can be formulated as a finite-dimensional convex program. These programs include a regularization term that promotes sparsity, which constitutes a variant of group Lasso. We first utilize semi-infinite programming theory to prove strong duality for finite width neural networks and then we express these architectures equivalently as high dimensional convex sparse recovery models. Remarkably, the worst-case complexity to solve the convex program is polynomial in the number of samples and number of neurons when the rank of the data matrix is bounded, which is the case in convolutional networks. To extend our method to training data of arbitrary rank, we develop a novel polynomial-time approximation scheme based on zonotope subsampling that comes with a guaranteed approximation ratio. We also show that all the stationary of the nonconvex training objective can be characterized as the global optimum of a subsampled convex program. Our convex models can be trained using standard convex solvers without resorting to heuristics or extensive hyper-parameter tuning unlike non-convex methods. Through extensive numerical experiments, we show that convex models can outperform traditional non-convex methods and are not sensitive to optimizer hyperparameters.
Abstract（参考訳）: ディープニューラルネットワーク(DNN)モデルの非凸性のため、その有効性は非凸最適化ヒューリスティックの使用に依存する。従来のDNNの訓練方法は、成功したモデルを作るのに費用がかかる経験的な方法を必要とすることが多く、明確な理論的基盤を持っていない。本研究では,畳み込み最適化理論とスパースリカバリモデルを用いてニューラルネットワークのトレーニングプロセスを洗練し,それらの最適重量のより良い解釈を行う。分割線形活性化を用いた2層ニューラルネットワークの訓練に焦点をあて,有限次元凸プログラムとして定式化できることを実証する。これらのプログラムには、sparsityを促進する正規化用語が含まれており、これはグループlassoの変種である。まず,有限幅ニューラルネットワークの強双対性を証明するために半無限計画理論を用い,これらのアーキテクチャを高次元凸スパース回復モデルとして表現する。注目すべきことに、凸プログラムを解くための最悪の複雑さは、データ行列のランクが境界付けられたときのサンプル数とニューロン数の多項式であり、畳み込みネットワークではそうである。本手法を任意のランクのデータトレーニングに拡張するために,zonotope部分サンプリングに基づく新しい多項式時間近似スキームを開発し,近似比を保証した。また,非凸学習目標の定常性はすべて,サブサンプリング凸プログラムの大域的最適性として特徴付けられることを示す。我々の凸モデルは、非凸法とは異なり、ヒューリスティックスや広範なハイパーパラメータチューニングに頼ることなく、標準凸解法を用いて訓練することができる。大規模な数値実験により、凸モデルは従来の非凸法よりも優れ、最適パラメータに敏感でないことを示す。

論文の概要: The Convex Landscape of Neural Networks: Characterizing Global Optima and Stationary Points via Lasso Models

関連論文リスト