Fugu-MT 論文翻訳(概要): Hard labels sampled from sparse targets mislead rotation invariant algorithms

論文の概要: Hard labels sampled from sparse targets mislead rotation invariant algorithms

arxiv url: http://arxiv.org/abs/2603.20967v1
Date: Sat, 21 Mar 2026 22:38:24 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-24 19:11:39.171784
Title: Hard labels sampled from sparse targets mislead rotation invariant algorithms
Title（参考訳）: スパースターゲットの不正回転不変アルゴリズムからサンプリングされたハードラベル
Authors: Avrajit Ghosh, Bin Yu, Manfred Warmuth, Peter Bartlett,
Abstract要約: バイナリロジスティック回帰では、フィードバックはデータの真の条件付き確率に対応するソフトラベルか、あるいはハードラベルをサンプリングすることができる。条件分布$(mathbfx_itopmathbfwstar)$と$mathbfwstar$が$s$-sparseである場合、回転不変アルゴリズムは確実に準最適であることを示す。
参考スコア（独自算出の注目度）: 6.565070116874382
License: http://creativecommons.org/licenses/by/4.0/
Abstract: One of the most common machine learning setups is logistic regression. In many classification models, including neural networks, the final prediction is obtained by applying a logistic link function to a linear score. In binary logistic regression, the feedback can be either soft labels, corresponding to the true conditional probability of the data (as in distillation), or sampled hard labels (taking values $\pm 1$). We point out a fundamental problem that arises even in a particularly favorable setting, where the goal is to learn a noise-free soft target of the form $σ(\mathbf{x}^{\top}\mathbf{w}^{\star})$. In the over-constrained case (i.e. the number of samples $n$ exceeds the input dimension $d$) with examples $(\mathbf{x}_i,σ(\mathbf{x}_i^{\top}\mathbf{w}^{\star}))$, it is sufficient to recover $\mathbf{w}^{\star}$ and hence achieve the Bayes risk. However, we prove that when the examples are labeled by hard labels $y_i$ sampled from the same conditional distribution $σ(\mathbf{x}_i^{\top}\mathbf{w}^{\star})$ and $\mathbf{w}^{\star}$ is $s$-sparse, then rotation-invariant algorithms are provably suboptimal: they incur an excess risk $Ω\!\left(\frac{d-1}{n}\right)$, while there are simple non-rotation invariant algorithms with excess risk $O(\frac{s\log d}{n})$. The simplest rotation invariant algorithm is gradient descent on the logistic loss (with early stopping). A simple non-rotation-invariant algorithm for sparse targets that achieves the above upper bounds uses gradient descent on the weights $u_i,v_i$, where now the linear weight $w_i$ is reparameterized as $u_iv_i$.
Abstract（参考訳）: 最も一般的な機械学習のセットアップの1つは、ロジスティック回帰である。ニューラルネットワークを含む多くの分類モデルにおいて、最終予測は、線形スコアにロジスティックリンク関数を適用することによって得られる。バイナリロジスティック回帰では、フィードバックは、(蒸留のように)データの真の条件付き確率に対応するソフトラベルか、(値が$\pm 1$になる)ハードラベルのどちらかである。特に有利な環境でも生じる根本的な問題は、目的は、$σ(\mathbf{x}^{\top}\mathbf{w}^{\star})$という形の雑音のないソフトターゲットを学ぶことである。過制約の場合(つまり、サンプル数$n$は入力次元$d$を超える)、例えば$(\mathbf{x}_i,σ(\mathbf{x}_i^{\top}\mathbf{w}^{\star})$は$\mathbf{w}^{\star}$を回復してベイズリスクを達成するのに十分である。しかし、同じ条件分布$σ(\mathbf{x}_i^{\top}\mathbf{w}^{\star})$と$\mathbf{w}^{\star}$が$s$スパースであれば、回転不変アルゴリズムは証明可能サブ最適である。 \left(\frac{d-1}{n}\right)$, しかし、過剰リスク$O(\frac{s\log d}{n})$の単純な非回転不変アルゴリズムが存在する。最も単純な回転不変アルゴリズムは(早期停止を伴う)ロジスティック損失の勾配降下である。上述の上限を達成したスパース目標に対する単純な非回転不変アルゴリズムは、重み$u_i,v_i$の勾配勾配を用いており、ここでは線形重み$w_i$を$u_iv_i$として再パラメータ化する。

論文の概要: Hard labels sampled from sparse targets mislead rotation invariant algorithms

関連論文リスト