Fugu-MT 論文翻訳(概要): Solving Regularized Exp, Cosh and Sinh Regression Problems

論文の概要: Solving Regularized Exp, Cosh and Sinh Regression Problems

arxiv url: http://arxiv.org/abs/2303.15725v2
Date: Thu, 11 May 2023 00:22:34 GMT
ステータス: 翻訳完了
システム内更新日: 2023-05-12 17:53:04.761618
Title: Solving Regularized Exp, Cosh and Sinh Regression Problems
Title（参考訳）: 正規化exp, cosh, sinh回帰問題の解法
Authors: Zhihang Li, Zhao Song, Tianyi Zhou
Abstract要約: 注意計算はTransformer、GPT-4、ChatGPTといった大規模言語モデルの基本的なタスクである。素直な方法はニュートンの方法を使うことである。
参考スコア（独自算出の注目度）: 40.47799094316649
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: In modern machine learning, attention computation is a fundamental task for training large language models such as Transformer, GPT-4 and ChatGPT. In this work, we study exponential regression problem which is inspired by the softmax/exp unit in the attention mechanism in large language models. The standard exponential regression is non-convex. We study the regularization version of exponential regression problem which is a convex problem. We use approximate newton method to solve in input sparsity time. Formally, in this problem, one is given matrix $A \in \mathbb{R}^{n \times d}$, $b \in \mathbb{R}^n$, $w \in \mathbb{R}^n$ and any of functions $\exp, \cosh$ and $\sinh$ denoted as $f$. The goal is to find the optimal $x$ that minimize $ 0.5 \| f(Ax) - b \|_2^2 + 0.5 \| \mathrm{diag}(w) A x \|_2^2$. The straightforward method is to use the naive Newton's method. Let $\mathrm{nnz}(A)$ denote the number of non-zeros entries in matrix $A$. Let $\omega$ denote the exponent of matrix multiplication. Currently, $\omega \approx 2.373$. Let $\epsilon$ denote the accuracy error. In this paper, we make use of the input sparsity and purpose an algorithm that use $\log ( \|x_0 - x^*\|_2 / \epsilon)$ iterations and $\widetilde{O}(\mathrm{nnz}(A) + d^{\omega} )$ per iteration time to solve the problem.
Abstract（参考訳）: 現代の機械学習では、注意計算はTransformer、GPT-4、ChatGPTといった大規模言語モデルを訓練するための基本的なタスクである。本研究では,大規模言語モデルにおける注意機構におけるsoftmax/exp単位に触発された指数回帰問題について検討する。標準指数回帰は非凸である。凸問題である指数回帰問題の正規化バージョンについて検討する。入力間隔時間において近似ニュートン法を用いて解く。形式的には、この問題において、行列 $a \in \mathbb{r}^{n \times d}$, $b \in \mathbb{r}^n$, $w \in \mathbb{r}^n$ と任意の関数 $\exp, \cosh$, $\sinh$ が与えられる。目標は、$ 0.5 \| f(ax) - b \|_2^2 + 0.5 \| \mathrm{diag}(w) ax \|_2^2$ を最小化する最適な$x$を見つけることである。単純な方法は、ネイブ・ニュートンのメソッドを使用することである。 $\mathrm{nnz}(A)$ は行列 $A$ における 0 でないエントリの数を表す。行列乗算の指数を$\omega$ とする。現在、$\omega \approx 2.373$である。精度エラーを表す$\epsilon$ とします。本稿では,1回の反復時間あたり$\log ( \|x_0 - x^*\|_2 / \epsilon)$と$\widetilde{o}(\mathrm{nnz}(a) + d^{\omega} )$を用いて解くアルゴリズムの入力スパーシティと目的について述べる。

関連論文リスト

Optimal Sketching for Residual Error Estimation for Matrix and Vector Norms [50.15964512954274]
線形スケッチを用いた行列とベクトルノルムの残差誤差推定問題について検討する。これは、前作とほぼ同じスケッチサイズと精度で、経験的にかなり有利であることを示す。また、スパースリカバリ問題に対して$Omega(k2/pn1-2/p)$低いバウンダリを示し、これは$mathrmpoly(log n)$ factorまで厳密である。
論文参考訳（メタデータ） (2024-08-16T02:33:07Z)
Provably learning a multi-head attention layer [55.2904547651831]
マルチヘッドアテンション層は、従来のフィードフォワードモデルとは分離したトランスフォーマーアーキテクチャの重要な構成要素の1つである。本研究では,ランダムな例から多面的注意層を実証的に学習する研究を開始する。最悪の場合、$m$に対する指数的依存は避けられないことを示す。
論文参考訳（メタデータ） (2024-02-06T15:39:09Z)
Solving Attention Kernel Regression Problem via Pre-conditioner [9.131385887605935]
我々は2種類の回帰問題に対するアルゴリズムを設計する:$min_xin mathbbRd|(Atop A)jx-b|$ for any positive integer $j$。 2番目のプロキシは、$exp(AAtop)$で表され、回帰$min_xin mathbbRn|exp(AAtop)xb |$で表されるグラム行列に指数的にエントリワイドを適用する。
論文参考訳（メタデータ） (2023-08-28T04:37:38Z)
Fast $(1+\varepsilon)$-Approximation Algorithms for Binary Matrix Factorization [54.29685789885059]
本稿では, 2次行列分解(BMF)問題に対する効率的な$(1+varepsilon)$-approximationアルゴリズムを提案する。目標は、低ランク因子の積として$mathbfA$を近似することである。我々の手法はBMF問題の他の一般的な変種に一般化する。
論文参考訳（メタデータ） (2023-06-02T18:55:27Z)
Randomized and Deterministic Attention Sparsification Algorithms for Over-parameterized Feature Dimension [18.57735939471469]
我々は注意問題のスパシフィケーションを考慮する。超大規模特徴量の場合、文の長さをほぼ線形に縮めることができる。
論文参考訳（メタデータ） (2023-04-10T05:52:38Z)
An Over-parameterized Exponential Regression [18.57735939471469]
LLM(Large Language Models)の分野での最近の発展は、指数的アクティベーション関数の使用への関心を喚起している。ニューラル関数 $F: mathbbRd times m times mathbbRd times mathbbRd times mathbbRd times mathbbRd times mathbbRd times mathbbRd times mathbbRd times mathbbRdd
論文参考訳（メタデータ） (2023-03-29T07:29:07Z)
A Nearly-Optimal Bound for Fast Regression with $\ell_\infty$ Guarantee [16.409210914237086]
行列 $Ain mathbbRntimes d$ とテンソル $bin mathbbRn$ が与えられたとき、 $ell_infty$ の回帰問題を考える。このような$ell_infty$レグレッションの保証を得るためには、濃密なスケッチ行列を使わなければならない。我々はまた、OCE(Oblivious Coordinate-wise Embedding)特性を利用した $ell_infty$ guarantee regression のための新しい分析フレームワークを開発した。
論文参考訳（メタデータ） (2023-02-01T05:22:40Z)
Learning a Single Neuron with Adversarial Label Noise via Gradient Descent [50.659479930171585]
モノトン活性化に対する $mathbfxmapstosigma(mathbfwcdotmathbfx)$ の関数について検討する。学習者の目標は仮説ベクトル $mathbfw$ that $F(mathbbw)=C, epsilon$ を高い確率で出力することである。
論文参考訳（メタデータ） (2022-06-17T17:55:43Z)
Active Sampling for Linear Regression Beyond the $\ell_2$ Norm [70.49273459706546]
対象ベクトルの少数のエントリのみを問合せすることを目的とした線形回帰のためのアクティブサンプリングアルゴリズムについて検討する。我々はこの$d$への依存が対数的要因まで最適であることを示す。また、損失関数に対して最初の全感度上界$O(dmax1,p/2log2 n)$を提供し、最大で$p$成長する。
論文参考訳（メタデータ） (2021-11-09T00:20:01Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。