Fugu-MT 論文翻訳(概要): Kernel Density Estimators in Large Dimensions

論文の概要: Kernel Density Estimators in Large Dimensions

arxiv url: http://arxiv.org/abs/2408.05807v2
Date: Fri, 16 Aug 2024 13:03:02 GMT
ステータス: 翻訳完了
システム内更新日: 2024-08-19 17:39:31.747719
Title: Kernel Density Estimators in Large Dimensions
Title（参考訳）: 大次元カーネル密度推定器
Authors: Giulio Biroli, Marc Mézard,
Abstract要約: 我々は、帯域幅$h$に応じて、密度$hat rho_hmathcal D(x)=frac1n hdsum_i=1n Kleft(fracx-y_ihright)$をカーネルベースで推定する。本稿では,Kullback-Leibler分散に基づく帯域幅の最適しきい値が,本論文で同定された新しい統計体系に含まれることを示す。
参考スコア（独自算出の注目度）: 9.299356601085586
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper studies Kernel density estimation for a high-dimensional distribution $\rho(x)$. Traditional approaches have focused on the limit of large number of data points $n$ and fixed dimension $d$. We analyze instead the regime where both the number $n$ of data points $y_i$ and their dimensionality $d$ grow with a fixed ratio $\alpha=(\log n)/d$. Our study reveals three distinct statistical regimes for the kernel-based estimate of the density $\hat \rho_h^{\mathcal {D}}(x)=\frac{1}{n h^d}\sum_{i=1}^n K\left(\frac{x-y_i}{h}\right)$, depending on the bandwidth $h$: a classical regime for large bandwidth where the Central Limit Theorem (CLT) holds, which is akin to the one found in traditional approaches. Below a certain value of the bandwidth, $h_{CLT}(\alpha)$, we find that the CLT breaks down. The statistics of $\hat \rho_h^{\mathcal {D}}(x)$ for a fixed $x$ drawn from $\rho(x)$ is given by a heavy-tailed distribution (an alpha-stable distribution). In particular below a value $h_G(\alpha)$, we find that $\hat \rho_h^{\mathcal {D}}(x)$ is governed by extreme value statistics: only a few points in the database matter and give the dominant contribution to the density estimator. We provide a detailed analysis for high-dimensional multivariate Gaussian data. We show that the optimal bandwidth threshold based on Kullback-Leibler divergence lies in the new statistical regime identified in this paper. Our findings reveal limitations of classical approaches, show the relevance of these new statistical regimes, and offer new insights for Kernel density estimation in high-dimensional settings.
Abstract（参考訳）: 本稿では,高次元分布$\rho(x)$に対するカーネル密度推定について検討する。従来のアプローチでは、大量のデータポイント$n$と固定次元$d$の制限に重点を置いてきた。代わりに、データポイントの数$n$$$y_i$とそれらの次元$d$が、固定比$\alpha=(\log n)/d$で成長する状態を分析する。我々の研究は、カーネルベースの密度$\hat \rho_h^{\mathcal {D}}(x)=\frac{1}{n h^d}\sum_{i=1}^n K\left(\frac{x-y_i}{h}\right)$, 帯域幅$h$: 中央極限定理(CLT)が持つ大帯域幅の古典的レジーム。帯域幅の一定の値の下に$h_{CLT}(\alpha)$ とすると、CLTが故障する。 $\hat \rho_h^{\mathcal {D}}(x)$ for a fixed $x$ from $\rho(x)$の統計は、重尾分布(アルファ安定分布)によって与えられる。特に$h_G(\alpha)$ 以下の値では、$\hat \rho_h^{\mathcal {D}}(x)$ は極値統計によって支配される。高次元多変量ガウスデータの詳細な解析を行う。本稿では,Kullback-Leibler分散に基づく帯域幅の最適しきい値が,本論文で同定された新しい統計体系に含まれることを示す。本研究は,従来の手法の限界を明らかにするとともに,これらの新しい統計体系の妥当性を示し,高次元環境におけるケルネル密度推定の新しい知見を提供する。

関連論文リスト

Generalization Properties of Score-matching Diffusion Models for Intrinsically Low-dimensional Data [32.72306410557258]
有限個のサンプルから未知分布の$$を学習するためのスコアベース拡散モデルの統計的収束について検討する。以上の結果から,拡散モデルがデータ固有の幾何学に自然に適応していることが示唆された。我々の理論は, 拡散モデルの解析を, GANと最適輸送で確立された急激なミニマックス速度で橋渡しするものである。
論文参考訳（メタデータ） (2026-03-04T03:59:02Z)
On the Intrinsic Dimensions of Data in Kernel Learning [1.675218291152252]
ラプラスカーネルのようなカーネルの場合、実効次元$d_K$はミンコフスキー次元$d_$よりもかなり小さく、正則領域で証明可能であることを示す。以上の結果から,Laplaceカーネルのようなカーネルの場合,実効次元$d_K$は,通常のドメインに有するMinkowski次元$d_$よりも著しく小さいことが分かる。
論文参考訳（メタデータ） (2026-01-22T17:32:24Z)
Proving the Limited Scalability of Centralized Distributed Optimization via a New Lower Bound Construction [57.93371273485736]
我々は、すべての労働者が同一の分布にアクセスする均質な(すなわちd.d.)場合であっても、すべての労働者が非バイアス付き境界 LDeltaepsilon2,$$$$$ のポリ対数的により良いポリ対数を求める集中型分散学習環境を考える。
論文参考訳（メタデータ） (2025-06-30T13:27:39Z)
Nonparametric MLE for Gaussian Location Mixtures: Certified Computation and Generic Behavior [28.71736321665378]
一次元のガウス的位置混合に対する非パラメトリック最大度推定器$widehatpi$について検討する。 We provide a algorithm that for small enough $varepsilon>0$ computes a $varepsilon$-approximation of $widehatpi in Wasserstein distance。また、$k$-atomicと条件付けられた$widehatpi$の分布は、関連する2k-1$次元パラメータ空間上の密度を許容することを示す。
論文参考訳（メタデータ） (2025-03-26T03:36:36Z)
A Statistical Analysis for Supervised Deep Learning with Exponential Families for Intrinsically Low-dimensional Data [32.98264375121064]
本研究では,指数関数系に従って説明変数が分散された場合の教師付き深層学習について考察する。説明変数の上界密度を仮定すると、収束速度は $tildemathcalOleft(dfrac2lfloorbetarfloor(beta + d)2beta + dn-frac22beta + dn-frac22beta + dn-frac22beta + dn-frac22beta と特徴づけられる。
論文参考訳（メタデータ） (2024-12-13T01:15:17Z)
Dimension-free Private Mean Estimation for Anisotropic Distributions [55.86374912608193]
以前の$mathRd上の分布に関する民間推定者は、次元性の呪いに苦しむ。本稿では,サンプルの複雑さが次元依存性を改善したアルゴリズムを提案する。
論文参考訳（メタデータ） (2024-11-01T17:59:53Z)
Statistical-Computational Trade-offs for Density Estimation [60.81548752871115]
幅広い種類のデータ構造に対して、それらの境界は著しく改善されないことを示す。これは密度推定のための新しい統計計算トレードオフである。
論文参考訳（メタデータ） (2024-10-30T15:03:33Z)
Convergence Analysis of Probability Flow ODE for Score-based Generative Models [5.939858158928473]
確率フローODEに基づく決定論的サンプリング器の収束特性を理論的・数値的両面から検討する。連続時間レベルでは、ターゲットと生成されたデータ分布の総変動を$mathcalO(d3/4delta1/2)$で表すことができる。
論文参考訳（メタデータ） (2024-04-15T12:29:28Z)
Efficient Estimation of the Central Mean Subspace via Smoothed Gradient Outer Products [12.047053875716506]
マルチインデックスモデルに対する十分な次元削減の問題を考察する。高速パラメトリック収束速度が$C_d cdot n-1/2$であることを示す。
論文参考訳（メタデータ） (2023-12-24T12:28:07Z)
Optimal Rate of Kernel Regression in Large Dimensions [13.641780902673792]
我々はまず,大次元データに対する上界と最小値下界のカーネル回帰を特徴付ける汎用ツールを構築する。我々は、新しいツールを使用して、カーネル回帰の余剰リスクの最小値が$n-1/2$であることを示す。
論文参考訳（メタデータ） (2023-09-08T11:29:05Z)
Nearly $d$-Linear Convergence Bounds for Diffusion Models via Stochastic Localization [40.808942894229325]
データ次元において線形である第1収束境界を提供する。拡散モデルは任意の分布を近似するために少なくとも$tilde O(fracd log2(1/delta)varepsilon2)$ stepsを必要とすることを示す。
論文参考訳（メタデータ） (2023-08-07T16:01:14Z)
Effective Minkowski Dimension of Deep Nonparametric Regression: Function Approximation and Statistical Theories [70.90012822736988]
ディープ非パラメトリック回帰に関する既存の理論は、入力データが低次元多様体上にある場合、ディープニューラルネットワークは本質的なデータ構造に適応できることを示した。本稿では,$mathcalS$で表される$mathbbRd$のサブセットに入力データが集中するという緩和された仮定を導入する。
論文参考訳（メタデータ） (2023-06-26T17:13:31Z)
Data Structures for Density Estimation [66.36971978162461]
p$のサブリニア数($n$)が与えられた場合、主な結果は$k$のサブリニアで$v_i$を識別する最初のデータ構造になります。また、Acharyaなどのアルゴリズムの改良版も提供します。
論文参考訳（メタデータ） (2023-06-20T06:13:56Z)
Random matrices in service of ML footprint: ternary random features with no performance loss [55.30329197651178]
我々は、$bf K$ の固有スペクトルが$bf w$ の i.d. 成分の分布とは独立であることを示す。 3次ランダム特徴(TRF)と呼ばれる新しいランダム手法を提案する。提案したランダムな特徴の計算には乗算が不要であり、古典的なランダムな特徴に比べてストレージに$b$のコストがかかる。
論文参考訳（メタデータ） (2021-10-05T09:33:49Z)
Convergence of Graph Laplacian with kNN Self-tuned Kernels [14.645468999921961]
自己チューニングされたカーネルは、各点に$sigma_i$ を $k$-nearest neighbor (kNN) 距離で適応的に設定する。本稿では、グラフラプラシアン作用素$L_N$を、kNN自己チューニングカーネルの新しい族に対する多様体(重み付き)ラプラシアンに収束することを証明する。
論文参考訳（メタデータ） (2020-11-03T04:55:33Z)
Analysis of KNN Density Estimation [56.29748742084386]
kNN密度推定は、サポートセットが知られている場合、$ell_infty$と$ell_infty$の条件の両方で最小限最適である。 $ell_infty$エラーはミニマックス下限に到達しないが、カーネル密度推定よりは優れている。
論文参考訳（メタデータ） (2020-09-30T03:33:17Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。