Efficient inference of interventional distributions
- URL: http://arxiv.org/abs/2107.11712v2
- Date: Tue, 27 Jul 2021 15:14:09 GMT
- Title: Efficient inference of interventional distributions
- Authors: Arnab Bhattacharyya, Sutanu Gayen, Saravanan Kandasamy, Vedant Raval,
N. V. Vinodchandran
- Abstract summary: We consider the problem of efficiently inferring interventional distributions in a causal Bayesian network from a finite number of observations.
We show that when $mathbfY$ is an arbitrary set, there is no efficient algorithm that outputs an evaluator of a distribution that is $varepsilon$-close to $P_bf x(mathbfY)$ unless all problems that have statistical zero-knowledge, including the Graph Isomorphism problem, have efficient randomized algorithms.
- Score: 13.31079561447385
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider the problem of efficiently inferring interventional distributions
in a causal Bayesian network from a finite number of observations. Let
$\mathcal{P}$ be a causal model on a set $\mathbf{V}$ of observable variables
on a given causal graph $G$. For sets $\mathbf{X},\mathbf{Y}\subseteq
\mathbf{V}$, and setting ${\bf x}$ to $\mathbf{X}$, let $P_{\bf x}(\mathbf{Y})$
denote the interventional distribution on $\mathbf{Y}$ with respect to an
intervention ${\bf x}$ to variables ${\bf x}$. Shpitser and Pearl (AAAI 2006),
building on the work of Tian and Pearl (AAAI 2001), gave an exact
characterization of the class of causal graphs for which the interventional
distribution $P_{\bf x}({\mathbf{Y}})$ can be uniquely determined. We give the
first efficient version of the Shpitser-Pearl algorithm. In particular, under
natural assumptions, we give a polynomial-time algorithm that on input a causal
graph $G$ on observable variables $\mathbf{V}$, a setting ${\bf x}$ of a set
$\mathbf{X} \subseteq \mathbf{V}$ of bounded size, outputs succinct
descriptions of both an evaluator and a generator for a distribution $\hat{P}$
that is $\varepsilon$-close (in total variation distance) to $P_{\bf
x}({\mathbf{Y}})$ where $Y=\mathbf{V}\setminus \mathbf{X}$, if $P_{\bf
x}(\mathbf{Y})$ is identifiable. We also show that when $\mathbf{Y}$ is an
arbitrary set, there is no efficient algorithm that outputs an evaluator of a
distribution that is $\varepsilon$-close to $P_{\bf x}({\mathbf{Y}})$ unless
all problems that have statistical zero-knowledge proofs, including the Graph
Isomorphism problem, have efficient randomized algorithms.
Related papers
- Sample-Efficient Linear Regression with Self-Selection Bias [7.605563562103568]
We consider the problem of linear regression with self-selection bias in the unknown-index setting.
We provide a novel and near optimally sample-efficient (in terms of $k$) algorithm to recover $mathbfw_1,ldots,mathbfw_kin.
Our algorithm succeeds under significantly relaxed noise assumptions, and therefore also succeeds in the related setting of max-linear regression.
arXiv Detail & Related papers (2024-02-22T02:20:24Z) - Provably learning a multi-head attention layer [55.2904547651831]
Multi-head attention layer is one of the key components of the transformer architecture that sets it apart from traditional feed-forward models.
In this work, we initiate the study of provably learning a multi-head attention layer from random examples.
We prove computational lower bounds showing that in the worst case, exponential dependence on $m$ is unavoidable.
arXiv Detail & Related papers (2024-02-06T15:39:09Z) - Testing Closeness of Multivariate Distributions via Ramsey Theory [40.926523210945064]
We investigate the statistical task of closeness (or equivalence) testing for multidimensional distributions.
Specifically, given sample access to two unknown distributions $mathbf p, mathbf q$ on $mathbb Rd$, we want to distinguish between the case that $mathbf p=mathbf q$ versus $|mathbf p-mathbf q|_A_k > epsilon$.
Our main result is the first closeness tester for this problem with em sub-learning sample complexity in any fixed dimension.
arXiv Detail & Related papers (2023-11-22T04:34:09Z) - SQ Lower Bounds for Learning Mixtures of Linear Classifiers [43.63696593768504]
We show that known algorithms for this problem are essentially best possible, even for the special case of uniform mixtures.
The key technical ingredient is a new construction of spherical designs that may be of independent interest.
arXiv Detail & Related papers (2023-10-18T10:56:57Z) - Fast $(1+\varepsilon)$-Approximation Algorithms for Binary Matrix
Factorization [54.29685789885059]
We introduce efficient $(1+varepsilon)$-approximation algorithms for the binary matrix factorization (BMF) problem.
The goal is to approximate $mathbfA$ as a product of low-rank factors.
Our techniques generalize to other common variants of the BMF problem.
arXiv Detail & Related papers (2023-06-02T18:55:27Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - Random matrices in service of ML footprint: ternary random features with
no performance loss [55.30329197651178]
We show that the eigenspectrum of $bf K$ is independent of the distribution of the i.i.d. entries of $bf w$.
We propose a novel random technique, called Ternary Random Feature (TRF)
The computation of the proposed random features requires no multiplication and a factor of $b$ less bits for storage compared to classical random features.
arXiv Detail & Related papers (2021-10-05T09:33:49Z) - Optimal Mean Estimation without a Variance [103.26777953032537]
We study the problem of heavy-tailed mean estimation in settings where the variance of the data-generating distribution does not exist.
We design an estimator which attains the smallest possible confidence interval as a function of $n,d,delta$.
arXiv Detail & Related papers (2020-11-24T22:39:21Z) - Agnostic Learning of a Single Neuron with Gradient Descent [92.7662890047311]
We consider the problem of learning the best-fitting single neuron as measured by the expected square loss.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
arXiv Detail & Related papers (2020-05-29T07:20:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.