Fugu-MT 論文翻訳(概要): A lift for input-convex neural network training

論文の概要: A lift for input-convex neural network training

arxiv url: http://arxiv.org/abs/2605.24274v1
Date: Fri, 22 May 2026 22:59:11 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-26 19:50:17.840182
Title: A lift for input-convex neural network training
Title（参考訳）: 入力凸ニューラルネットワークトレーニングのためのリフト
Authors: Ali Siahkoohi, Anirudh Thatipelli,
Abstract要約: In Inputdimensional Neural Network (ICNN) は、対数凹密度推定、凸ポテンシャル正規化フロー、最適輸送、後部へのトランスポートマップインバージョンに使用される。非負の円錐への標準射影勾配降下(PGD)は、硬く非滑らかな投影を施す。微分可能な代替品であるソフト・プラス・リパラメトリゼーション (Softplus reparametrization) は、重量級数で指数関数的に勾配を減衰させ、層間重量損失を減少させる訓練を停止させる。揚力はPGDと直接ソフトプラスよりも低い試験損失を示し,プラトーバウンドトレーニングを1つにする。
参考スコア（独自算出の注目度）: 3.142113135607563
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Input-convex neural networks (ICNNs) are widely used for log-concave density estimation, convex-potential normalizing flows, optimal transport, and transport-map inversion for high-dimensional Bayesian posteriors. These tasks share a structural constraint: the inter-layer weights of the ICNN must remain non-negative. The standard recipe, projected gradient descent (PGD) onto the non-negative cone, applies a hard, non-smooth projection -- the stiff-penalty limit of an ADMM-style constraint splitting -- and its classical convergence guarantees do not transfer to the non-smooth ICNN training landscape; the differentiable alternative, softplus reparametrization, attenuates the gradient exponentially in the weight magnitude, stalling training with dead inter-layer weights and plateaued loss. Inspired by parameter-extension lifts of PDE-constrained inverse problems, we propose the lift: instead of constraining the inter-layer weights directly, we train an unconstrained hypernetwork that emits them from a permutation-invariant summary of the input batch. This adds stochasticity to the training dynamics that softens the loss landscape, letting the iterates escape the gradient-attenuated region where direct softplus stalls. We trace this softening to three structural ingredients -- a learnable bias acting as slack, a hypernetwork body that conditions on the target batch, and a cross-covariance coupling the two through batch stochasticity -- and prove each one necessary: deleting any single ingredient collapses the cross-covariance that carries the softening. On log-concave energy-based modeling from one-dimensional toy targets to image-flavored latents, and convex-potential normalizing flows on a 21-dimensional tabular benchmark, we show that the lift reaches a lower test loss than both PGD and direct softplus, and turns a plateau-bounded training trajectory into a valley-descending one.
Abstract（参考訳）: 入力凸ニューラルネットワーク (ICNN) は, 対数凹密度推定, 凸ポテンシャル正規化流, 最適輸送, 高次元ベイズ後部へのトランスポートマップインバージョンに広く利用されている。これらのタスクは構造的な制約を共有しており、ICNNの層間重みは非負でなければならない。標準レシピである、非負の錐体への勾配降下(PGD)は、ハードで非滑らかな投射 -- ADMMスタイルの制約分割の厳格なペナルティ限界 -- を適用し、その古典的な収束保証は非滑らかなICNNトレーニングランドに転送されない。 PDE制約の逆問題に対するパラメータ拡張リフトにインスパイアされたこのリフトは、層間重みを直接拘束する代わりに、入力バッチの置換不変サマリからそれらを出力する非拘束ハイパーネットワークを訓練する。これにより、損失の風景を和らげるトレーニングのダイナミクスに確率性が追加され、直接のソフトプラスが停止する勾配減衰した領域からイテレートが逃れる。この軟化は、3つの構造成分(スラック(slack)として機能する学習可能なバイアス、ターゲットのバッチに条件を定めているハイパーネットワーク本体、バッチの確率性を通じてこれら2つを相互に結合するクロス共分散体)に遡る。対数対数対数対数対数対数対数対数対のエネルギーに基づく1次元の玩具目標からイメージフレーバー付き潜水剤、および21次元の表紙ベンチマーク上での凸ポテンシャル正規化フローについて、リフトがPGDと直接ソフトプラスの両方よりも低い試験損失を達成し、プラトーバウンドのトレーニング軌道を谷外軌道に変換することを示した。

論文の概要: A lift for input-convex neural network training

関連論文リスト