Learning Probabilities of Causation from Finite Population Data
- URL: http://arxiv.org/abs/2210.08453v1
- Date: Sun, 16 Oct 2022 05:46:25 GMT
- Title: Learning Probabilities of Causation from Finite Population Data
- Authors: Ang Li, Song Jiang, Yizhou Sun, Judea Pearl
- Abstract summary: We propose a machine learning model that helps to learn the bounds of the probabilities of causation for subpopulations given finite population data.
We show by a simulated study that the machine learning model is able to learn the bounds of PNS for 32768 subpopulations with only knowing roughly 500 of them from the finite population data.
- Score: 40.99426447422972
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper deals with the problem of learning the probabilities of causation
of subpopulations given finite population data. The tight bounds of three basic
probabilities of causation, the probability of necessity and sufficiency (PNS),
the probability of sufficiency (PS), and the probability of necessity (PN),
were derived by Tian and Pearl. However, obtaining the bounds for each
subpopulation requires experimental and observational distributions of each
subpopulation, which is usually impractical to estimate given finite population
data. We propose a machine learning model that helps to learn the bounds of the
probabilities of causation for subpopulations given finite population data. We
further show by a simulated study that the machine learning model is able to
learn the bounds of PNS for 32768 subpopulations with only knowing roughly 500
of them from the finite population data.
Related papers
- Learning Probabilities of Causation from Finite Population Data [49.05791737581312]
This paper addresses the challenge of predicting probabilities of causation for subpopulations with textbfinsufficient data.<n>We propose using machine learning models that draw insights from subpopulations with sufficient data.<n>Our evaluation of multiple machine learning models indicates that, given the population-level data and an appropriate choice of machine learning model and activation function, PNS can be effectively predicted.
arXiv Detail & Related papers (2025-05-22T03:31:44Z) - Estimating Probabilities of Causation with Machine Learning Models [13.50260067414662]
This paper addresses the challenge of predicting probabilities of causation for subpopulations with insufficient data.
We use machine learning models that draw insights from subpopulations with sufficient data.
We show that our multilayer perceptron (MLP) model with the Mish activation function achieves a mean absolute error (MAE) of approximately 0.02 in predicting probability of necessity (PNS) for 32,768 subpopulations.
arXiv Detail & Related papers (2025-02-13T00:18:08Z) - Optimal Multi-Distribution Learning [88.3008613028333]
Multi-distribution learning seeks to learn a shared model that minimizes the worst-case risk across $k$ distinct data distributions.
We propose a novel algorithm that yields an varepsilon-optimal randomized hypothesis with a sample complexity on the order of (d+k)/varepsilon2.
arXiv Detail & Related papers (2023-12-08T16:06:29Z) - s-ID: Causal Effect Identification in a Sub-Population [23.221279036710012]
We introduce and advocate for a causal inference problem in sub-populations (henceforth called s-ID)
Existing inference problems in sub-populations operate on the premise that the given data originate from the entire population.
We provide necessary and sufficient conditions that must hold in the causal graph for a causal effect in a sub-population to be identifiable from the observational distribution of that sub-population.
arXiv Detail & Related papers (2023-09-05T14:43:10Z) - Probabilities of Causation: Role of Observational Data [20.750773939911685]
We discuss the conditions that observational data are worth considering to improve the quality of the bounds.
We also apply the proposed theorems to the unit selection problem defined by Li and Pearl.
arXiv Detail & Related papers (2022-10-17T09:10:11Z) - Probabilities of Causation: Adequate Size of Experimental and
Observational Samples [17.565045120151865]
Tian and Pearl derived sharp bounds for the probability of necessity and sufficiency (PNS), the probability of sufficiency (PS), and the probability of necessity (PN) using experimental and observational data.
The assumption is that one is in possession of a large enough sample to permit an accurate estimation of the experimental and observational distributions.
We present a method for determining the sample size needed for such estimation, when a given confidence interval (CI) is specified.
arXiv Detail & Related papers (2022-10-10T21:59:49Z) - Reconciling Individual Probability Forecasts [78.0074061846588]
We show that two parties who agree on the data cannot disagree on how to model individual probabilities.
We conclude that although individual probabilities are unknowable, they are contestable via a computationally and data efficient process.
arXiv Detail & Related papers (2022-09-04T20:20:35Z) - Evaluating Distributional Distortion in Neural Language Modeling [81.83408583979745]
A heavy-tail of rare events accounts for a significant amount of the total probability mass of distributions in language.
Standard language modeling metrics such as perplexity quantify the performance of language models (LM) in aggregate.
We develop a controlled evaluation scheme which uses generative models trained on natural data as artificial languages.
arXiv Detail & Related papers (2022-03-24T01:09:46Z) - Causes of Effects: Learning individual responses from population data [23.593582720307207]
We study the problem of individualization and its applications in medicine.
For example, the probability of benefiting from a treatment concerns an individual having a favorable outcome if treated and an unfavorable outcome if untreated.
We analyze and expand on existing research by applying bounds to the probability of necessity and sufficiency (PNS) along with graphical criteria and practical applications.
arXiv Detail & Related papers (2021-04-28T12:38:11Z) - General stochastic separation theorems with optimal bounds [68.8204255655161]
Phenomenon of separability was revealed and used in machine learning to correct errors of Artificial Intelligence (AI) systems and analyze AI instabilities.
Errors or clusters of errors can be separated from the rest of the data.
The ability to correct an AI system also opens up the possibility of an attack on it, and the high dimensionality induces vulnerabilities caused by the same separability.
arXiv Detail & Related papers (2020-10-11T13:12:41Z) - Cumulative deviation of a subpopulation from the full population [0.0]
Assessing equity in treatment of a subpopulation often involves assigning numerical "scores" to all individuals in the full population.
Given such scores, individuals with similar scores may or may not attain similar outcomes independent of the individuals' memberships in the subpopulation.
The cumulative plots encode subpopulation deviation directly as the slopes of secant lines for the graphs.
arXiv Detail & Related papers (2020-08-04T19:30:02Z) - Survival Cluster Analysis [93.50540270973927]
There is an unmet need in survival analysis for identifying subpopulations with distinct risk profiles.
An approach that addresses this need is likely to improve characterization of individual outcomes.
arXiv Detail & Related papers (2020-02-29T22:41:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.