Fugu-MT 論文翻訳(概要): Testing Dependency of Unlabeled Databases

論文の概要: Testing Dependency of Unlabeled Databases

arxiv url: http://arxiv.org/abs/2311.05874v1
Date: Fri, 10 Nov 2023 05:17:03 GMT
ステータス: 翻訳完了
システム内更新日: 2023-11-13 15:57:02.798990
Title: Testing Dependency of Unlabeled Databases
Title（参考訳）: ラベルなしデータベースの依存性テスト
Authors: Vered Paslev and Wasim Huleihel
Abstract要約: 2つのランダムデータベース $mathsfXinmathcalXntimes d$ と $mathsfYinmathcalYntimes d$ は統計的に依存するかどうかによって異なる。最適テストが情報理論上不可能かつ可能なしきい値の特徴付けを行う。
参考スコア（独自算出の注目度）: 5.384630221560811
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, we investigate the problem of deciding whether two random databases $\mathsf{X}\in\mathcal{X}^{n\times d}$ and $\mathsf{Y}\in\mathcal{Y}^{n\times d}$ are statistically dependent or not. This is formulated as a hypothesis testing problem, where under the null hypothesis, these two databases are statistically independent, while under the alternative, there exists an unknown row permutation $\sigma$, such that $\mathsf{X}$ and $\mathsf{Y}^\sigma$, a permuted version of $\mathsf{Y}$, are statistically dependent with some known joint distribution, but have the same marginal distributions as the null. We characterize the thresholds at which optimal testing is information-theoretically impossible and possible, as a function of $n$, $d$, and some spectral properties of the generative distributions of the datasets. For example, we prove that if a certain function of the eigenvalues of the likelihood function and $d$, is below a certain threshold, as $d\to\infty$, then weak detection (performing slightly better than random guessing) is statistically impossible, no matter what the value of $n$ is. This mimics the performance of an efficient test that thresholds a centered version of the log-likelihood function of the observed matrices. We also analyze the case where $d$ is fixed, for which we derive strong (vanishing error) and weak detection lower and upper bounds.
Abstract（参考訳）: 本稿では、2つのランダムデータベース $\mathsf{X}\in\mathcal{X}^{n\times d}$ と $\mathsf{Y}\in\mathcal{Y}^{n\times d}$ が統計的に依存するか否かを決定する問題について検討する。これは仮説テスト問題として定式化されており、ヌル仮説の下では、これらの2つのデータベースは統計的に独立であるが、別の方法では、$\mathsf{x}$ と $\mathsf{y}^\sigma$($\mathsf{y}$ の置換版)が既知のジョイント分布に統計的に依存するが、ヌルと同じ限界分布を持つような、未知の行置換 $\sigma$ が存在する。最適なテストが情報理論上不可能で可能な閾値を、n$、$d$、およびデータセットの生成分布のスペクトル特性の関数として特徴付ける。例えば、確率関数の固有値と$d$の特定の関数が、$d\to\infty$のようにあるしきい値以下であれば、$n$の値が何であれ、弱い検出(ランダムな推測よりもわずかに優れている)は統計的に不可能である。これは、観測された行列のログ様関数の中心バージョンを閾値付けする効率的なテストのパフォーマンスを模倣する。また、$d$が固定された場合の分析を行い、その場合、強い(消滅エラー)と弱い検出の下限と上限を導出する。

関連論文リスト

Statistical-Computational Trade-offs for Density Estimation [60.81548752871115]
幅広い種類のデータ構造に対して、それらの境界は著しく改善されないことを示す。これは密度推定のための新しい統計計算トレードオフである。
論文参考訳（メタデータ） (2024-10-30T15:03:33Z)
Guarantees for Nonlinear Representation Learning: Non-identical Covariates, Dependent Data, Fewer Samples [24.45016514352055]
我々は、関数クラス$mathcal F times Mathcal G$から、T+1$関数$f_star(t) circ g_star$を学習する際のサンプル複雑度について研究する。タスク数が$T$になるにつれて、サンプル要件とリスクバウンドの両方が$r$次元回帰に収束することを示す。
論文参考訳（メタデータ） (2024-10-15T03:20:19Z)
A computational transition for detecting correlated stochastic block models by low-degree polynomials [13.396246336911842]
一対のランダムグラフにおける相関性の検出は、近年広く研究されている基本的な統計的および計算上の問題である。一対の相関ブロックモデル $mathcalS(n,tfraclambdan;k,epsilon;s)$ を共通の親ブロックモデル $mathcalS(n,tfraclambdan;k,epsilon;s)$ からサブサンプリングする。隣接部のエントリーのエンスロー度に基づくテストに焦点をあてる
論文参考訳（メタデータ） (2024-09-02T06:14:05Z)
Transfer Operators from Batches of Unpaired Points via Entropic Transport Kernels [3.099885205621181]
そこで我々は,最大形推論関数を導出し,計算可能な近似を提案し,それらの特性を解析する。我々は、ブロック数$N$が無限に近づくと、経験的近似から真の密度を回復できることを示す$Gamma$-convergenceの結果を証明する。
論文参考訳（メタデータ） (2024-02-13T12:52:41Z)
Detection of Correlated Random Vectors [7.320365821066746]
2つのランダムベクトル $mathsfXinmathbbRn$ と $mathsfYinmathbbRn$ は相関するか否かは問わない。最適テストが情報理論的に不可能で可能なしきい値を分析する。
論文参考訳（メタデータ） (2024-01-24T12:58:08Z)
Statistical Learning under Heterogeneous Distribution Shift [71.8393170225794]
ground-truth predictor is additive $mathbbE[mathbfz mid mathbfx,mathbfy] = f_star(mathbfx) +g_star(mathbfy)$.
論文参考訳（メタデータ） (2023-02-27T16:34:21Z)
Phase Transitions in the Detection of Correlated Databases [12.010807505655238]
本稿では,2つのガウスデータベースの相関関係を$mathsfXinmathbbRntimes d$と$mathsfYntimes d$で検出する問題について検討する。この問題は、ソーシャルメディア、計算生物学などの分析に関係している。
論文参考訳（メタデータ） (2023-02-07T10:39:44Z)
Complexity of High-Dimensional Identity Testing with Coordinate Conditional Sampling [5.3098033683382155]
本研究では,高次元分布におけるアイデンティティテスト問題について検討する。私たちは、$mathsfCoordinate Oracle$という、非常に弱い条件付きサンプリングオラクルを考えています。エントロピーの近似テンソル化(英語版)として知られる解析的性質が$n$-次元可視分布$mu$に対して成り立つならば、$tildeO(n/varepsilon)$クエリを使用して、隠れ分布$pi$に対して効率的なアイデンティティテストアルゴリズムが存在することを証明している。
論文参考訳（メタデータ） (2022-07-19T06:49:24Z)
Random matrices in service of ML footprint: ternary random features with no performance loss [55.30329197651178]
我々は、$bf K$ の固有スペクトルが$bf w$ の i.d. 成分の分布とは独立であることを示す。 3次ランダム特徴(TRF)と呼ばれる新しいランダム手法を提案する。提案したランダムな特徴の計算には乗算が不要であり、古典的なランダムな特徴に比べてストレージに$b$のコストがかかる。
論文参考訳（メタデータ） (2021-10-05T09:33:49Z)
Spectral properties of sample covariance matrices arising from random matrices with independent non identically distributed columns [50.053491972003656]
関数 $texttr(AR(z))$, for $R(z) = (frac1nXXT- zI_p)-1$ and $Ain mathcal M_p$ deterministic, have a standard deviation of order $O(|A|_* / sqrt n)$. ここでは、$|mathbb E[R(z)] - tilde R(z)|_F を示す。
論文参考訳（メタデータ） (2021-09-06T14:21:43Z)
The Sample Complexity of Robust Covariance Testing [56.98280399449707]
i. i. d. 形式 $Z = (1-epsilon) X + epsilon B$ の分布からのサンプル。ここで $X$ はゼロ平均で未知の共分散である Gaussian $mathcalN(0, Sigma)$ である。汚染がない場合、事前の研究は、$O(d)$サンプルを使用するこの仮説テストタスクの単純なテスターを与えた。サンプル複雑性の上限が $omega(d2)$ for $epsilon$ an arbitrarily small constant and $gamma であることを証明します。
論文参考訳（メタデータ） (2020-12-31T18:24:41Z)
Optimal Testing of Discrete Distributions with High Probability [49.19942805582874]
高確率状態に着目して離散分布を試験する問題について検討する。一定の要素でサンプル最適である近接性および独立性テストのための最初のアルゴリズムを提供する。
論文参考訳（メタデータ） (2020-09-14T16:09:17Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。