On Correlation Detection and Alignment Recovery of Gaussian Databases
- URL: http://arxiv.org/abs/2211.01069v2
- Date: Thu, 25 May 2023 10:02:58 GMT
- Title: On Correlation Detection and Alignment Recovery of Gaussian Databases
- Authors: Ran Tamir
- Abstract summary: Correlation detection is a hypothesis testing problem; under the null hypothesis, the databases are independent, and under the alternate hypothesis, they are correlated.
We develop bounds on the type-I and type-II error probabilities, and show that the analyzed detector performs better than a recently proposed detector.
When the databases are accepted as correlated, the algorithm also recovers some partial alignment between the given databases.
- Score: 5.33024001730262
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we propose an efficient two-stage algorithm solving a joint
problem of correlation detection and partial alignment recovery between two
Gaussian databases. Correlation detection is a hypothesis testing problem;
under the null hypothesis, the databases are independent, and under the
alternate hypothesis, they are correlated, under an unknown row permutation. We
develop bounds on the type-I and type-II error probabilities, and show that the
analyzed detector performs better than a recently proposed detector, at least
for some specific parameter choices. Since the proposed detector relies on a
statistic, which is a sum of dependent indicator random variables, then in
order to bound the type-I probability of error, we develop a novel
graph-theoretic technique for bounding the $k$-th order moments of such
statistics. When the databases are accepted as correlated, the algorithm also
recovers some partial alignment between the given databases. We also propose
two more algorithms: (i) One more algorithm for partial alignment recovery,
whose reliability and computational complexity are both higher than those of
the first proposed algorithm. (ii) An algorithm for full alignment recovery,
which has a reduced amount of calculations and a not much lower error
probability, when compared to the optimal recovery procedure.
Related papers
- Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z) - Bagged Regularized $k$-Distances for Anomaly Detection [9.899763598214122]
We propose a new distance-based algorithm called bagged regularized $k$-distances for anomaly detection (BRDAD)
Our BRDAD algorithm selects the weights by minimizing the surrogate risk, i.e., the finite sample bound of the empirical risk of the bagged weighted $k$-distances for density estimation (BWDDE)
On the theoretical side, we establish fast convergence rates of the AUC regret of our algorithm and demonstrate that the bagging technique significantly reduces the computational complexity.
arXiv Detail & Related papers (2023-12-02T07:00:46Z) - New metrics and search algorithms for weighted causal DAGs [7.424262881242935]
We study causal graph discovery via adaptive interventions with node-dependent costs.
We define a new benchmark that captures the worst-case interventional cost for any search algorithm.
We provide adaptive search algorithms that achieve logarithmic approximations under various settings.
arXiv Detail & Related papers (2023-05-08T03:48:37Z) - Stability is Stable: Connections between Replicability, Privacy, and
Adaptive Generalization [26.4468964378511]
A replicable algorithm gives the same output with high probability when its randomness is fixed.
Using replicable algorithms for data analysis can facilitate the verification of published results.
We establish new connections and separations between replicability and standard notions of algorithmic stability.
arXiv Detail & Related papers (2023-03-22T21:35:50Z) - Learning to Bound Counterfactual Inference in Structural Causal Models
from Observational and Randomised Data [64.96984404868411]
We derive a likelihood characterisation for the overall data that leads us to extend a previous EM-based algorithm.
The new algorithm learns to approximate the (unidentifiability) region of model parameters from such mixed data sources.
It delivers interval approximations to counterfactual results, which collapse to points in the identifiable case.
arXiv Detail & Related papers (2022-12-06T12:42:11Z) - An Application of a Multivariate Estimation of Distribution Algorithm to
Cancer Chemotherapy [59.40521061783166]
Chemotherapy treatment for cancer is a complex optimisation problem with a large number of interacting variables and constraints.
We show that the more sophisticated algorithm would yield better performance on a complex problem like this.
We hypothesise that this is caused by the more sophisticated algorithm being impeded by the large number of interactions in the problem.
arXiv Detail & Related papers (2022-05-17T15:28:46Z) - Riemannian classification of EEG signals with missing values [67.90148548467762]
This paper proposes two strategies to handle missing data for the classification of electroencephalograms.
The first approach estimates the covariance from imputed data with the $k$-nearest neighbors algorithm; the second relies on the observed data by leveraging the observed-data likelihood within an expectation-maximization algorithm.
As results show, the proposed strategies perform better than the classification based on observed data and allow to keep a high accuracy even when the missing data ratio increases.
arXiv Detail & Related papers (2021-10-19T14:24:50Z) - Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature
Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization.
We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z) - Accelerated Message Passing for Entropy-Regularized MAP Inference [89.15658822319928]
Maximum a posteriori (MAP) inference in discrete-valued random fields is a fundamental problem in machine learning.
Due to the difficulty of this problem, linear programming (LP) relaxations are commonly used to derive specialized message passing algorithms.
We present randomized methods for accelerating these algorithms by leveraging techniques that underlie classical accelerated gradient.
arXiv Detail & Related papers (2020-07-01T18:43:32Z) - A Robust Matching Pursuit Algorithm Using Information Theoretic Learning [37.968665739578185]
A new OMP algorithm is developed based on the information theoretic learning (ITL)
The experimental results on both simulated and real-world data consistently demonstrate the superiority of the proposed OMP algorithm in data recovery, image reconstruction, and classification.
arXiv Detail & Related papers (2020-05-10T01:36:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.