Fugu-MT 論文翻訳(概要): Adaptive Sampled Softmax with Inverted Multi-Index: Methods, Theory and Applications

論文の概要: Adaptive Sampled Softmax with Inverted Multi-Index: Methods, Theory and Applications

arxiv url: http://arxiv.org/abs/2501.08563v1
Date: Wed, 15 Jan 2025 04:09:21 GMT
ステータス: 翻訳完了
システム内更新日: 2025-01-16 16:46:28.34762
Title: Adaptive Sampled Softmax with Inverted Multi-Index: Methods, Theory and Applications
Title（参考訳）: 逆多重インデックスを用いた適応サンプリングソフトマックス:方法、理論、応用
Authors: Jin Chen, Jin Zhang, Xu huang, Yi Yang, Defu Lian, Enhong Chen,
Abstract要約: MIDX-Samplerは、逆多重インデックスアプローチに基づく新しい適応型サンプリング戦略である。本手法は, サンプリングバイアス, 勾配バイアス, 収束速度, 一般化誤差境界などの重要な問題に対処するため, 厳密な理論的解析によって裏付けられている。
参考スコア（独自算出の注目度）: 79.53938312089308
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The softmax function is a cornerstone of multi-class classification, integral to a wide range of machine learning applications, from large-scale retrieval and ranking models to advanced large language models. However, its computational cost grows linearly with the number of classes, which becomes prohibitively expensive in scenarios with millions or even billions of classes. The sampled softmax, which relies on self-normalized importance sampling, has emerged as a powerful alternative, significantly reducing computational complexity. Yet, its estimator remains unbiased only when the sampling distribution matches the true softmax distribution. To improve both approximation accuracy and sampling efficiency, we propose the MIDX Sampler, a novel adaptive sampling strategy based on an inverted multi-index approach. Concretely, we decompose the softmax probability into several multinomial probabilities, each associated with a specific set of codewords and the last associated with the residual score of queries, thus reducing time complexity to the number of codewords instead of the number of classes. To further boost efficiency, we replace the query-specific residual probability with a simple uniform distribution, simplifying the computation while retaining high performance. Our method is backed by rigorous theoretical analysis, addressing key concerns such as sampling bias, gradient bias, convergence rates, and generalization error bounds. The results demonstrate that a smaller divergence from the ideal softmax distribution leads to faster convergence and improved generalization. Extensive experiments on large-scale language models, sequential recommenders, and extreme multi-class classification tasks confirm that the MIDX-Sampler delivers superior effectiveness and efficiency compared to existing approaches.
Abstract（参考訳）: ソフトマックス関数は、大規模検索やランキングモデルから高度な大規模言語モデルまで、幅広い機械学習アプリケーションに不可欠なマルチクラス分類の基盤である。しかし、その計算コストはクラスの数とともに直線的に増加し、数百万から数十億のクラスを持つシナリオでは違法に高価になる。自己正規化された重要度サンプリングに依存するサンプルソフトマックスは、計算複雑性を著しく低減する強力な代替手段として登場した。しかし、サンプリング分布が真のソフトマックス分布と一致する場合のみ、その推定器は非バイアスのままである。近似精度とサンプリング効率の両方を改善するために,逆多重インデックスアプローチに基づく新しい適応型サンプリング手法であるMIDXサプラーを提案する。具体的には、ソフトマックス確率を複数の多項確率に分解し、それぞれが特定のコードワードのセットに関連付けられ、最後にはクエリの残余スコアに関連付けられ、クラス数ではなくコードワードの数に時間的複雑さを減少させる。さらに効率を高めるために,クエリ固有残差確率を一様分布に置き換え,高い性能を維持しながら計算を簡素化する。本手法は, サンプリングバイアス, 勾配バイアス, 収束速度, 一般化誤差境界などの重要な問題に対処するため, 厳密な理論的解析によって裏付けられている。その結果、理想的なソフトマックス分布からの発散がより早く収束し、一般化が向上することを示した。 MIDX-Samplerは,大規模言語モデル,シーケンシャルレコメンデータ,および極端な多クラス分類タスクにおいて,既存の手法に比べて優れた効率性と効率性を提供することを確認した。

論文の概要: Adaptive Sampled Softmax with Inverted Multi-Index: Methods, Theory and Applications

関連論文リスト