Estimating Probabilities of Causation with Machine Learning Models
- URL: http://arxiv.org/abs/2502.08858v1
- Date: Thu, 13 Feb 2025 00:18:08 GMT
- Title: Estimating Probabilities of Causation with Machine Learning Models
- Authors: Shuai Wang, Ang Li,
- Abstract summary: This paper addresses the challenge of predicting probabilities of causation for subpopulations with insufficient data.<n>We use machine learning models that draw insights from subpopulations with sufficient data.<n>We show that our multilayer perceptron (MLP) model with the Mish activation function achieves a mean absolute error (MAE) of approximately 0.02 in predicting probability of necessity (PNS) for 32,768 subpopulations.
- Score: 13.50260067414662
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Probabilities of causation play a crucial role in modern decision-making. This paper addresses the challenge of predicting probabilities of causation for subpopulations with insufficient data using machine learning models. Tian and Pearl first defined and derived tight bounds for three fundamental probabilities of causation: the probability of necessity and sufficiency (PNS), the probability of sufficiency (PS), and the probability of necessity (PN). However, estimating these probabilities requires both experimental and observational distributions specific to each subpopulation, which are often unavailable or impractical to obtain with limited population-level data. We assume that the probabilities of causation for each subpopulation are determined by its characteristics. To estimate these probabilities for subpopulations with insufficient data, we propose using machine learning models that draw insights from subpopulations with sufficient data. Our evaluation of multiple machine learning models indicates that, given sufficient population-level data and an appropriate choice of machine learning model and activation function, PNS can be effectively predicted. Through simulation studies, we show that our multilayer perceptron (MLP) model with the Mish activation function achieves a mean absolute error (MAE) of approximately 0.02 in predicting PNS for 32,768 subpopulations using data from around 2,000 subpopulations.
Related papers
- Learning Probabilities of Causation from Finite Population Data [49.05791737581312]
This paper addresses the challenge of predicting probabilities of causation for subpopulations with textbfinsufficient data.<n>We propose using machine learning models that draw insights from subpopulations with sufficient data.<n>Our evaluation of multiple machine learning models indicates that, given the population-level data and an appropriate choice of machine learning model and activation function, PNS can be effectively predicted.
arXiv Detail & Related papers (2025-05-22T03:31:44Z) - Estimating the Probabilities of Rare Outputs in Language Models [8.585890569162267]
We study low probability estimation in the context of argmax sampling from small transformer language models.<n>We find that importance sampling outperforms activation extrapolation, but both outperform naive sampling.<n>We argue that new methods for low probability estimation are needed to provide stronger guarantees about worst-case performance.
arXiv Detail & Related papers (2024-10-17T04:31:18Z) - R-divergence for Estimating Model-oriented Distribution Discrepancy [37.939239477868796]
We introduce R-divergence, designed to assess model-oriented distribution discrepancies.
R-divergence learns a minimum hypothesis on the mixed data and then gauges the empirical risk difference between them.
We evaluate the test power across various unsupervised and supervised tasks and find that R-divergence achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-10-02T11:30:49Z) - An Epistemic and Aleatoric Decomposition of Arbitrariness to Constrain the Set of Good Models [7.620967781722717]
Recent research reveals that machine learning (ML) models are highly sensitive to minor changes in their training procedure.<n>We show that stability decomposes into epistemic and aleatoric components, capturing the consistency and confidence in prediction.<n>We propose a model selection procedure that includes epistemic and aleatoric criteria alongside existing accuracy and fairness criteria, and show that it successfully narrows down a large set of good models.
arXiv Detail & Related papers (2023-02-09T09:35:36Z) - Learning Probabilities of Causation from Finite Population Data [40.99426447422972]
We propose a machine learning model that helps to learn the bounds of the probabilities of causation for subpopulations given finite population data.
We show by a simulated study that the machine learning model is able to learn the bounds of PNS for 32768 subpopulations with only knowing roughly 500 of them from the finite population data.
arXiv Detail & Related papers (2022-10-16T05:46:25Z) - Evaluating Distributional Distortion in Neural Language Modeling [81.83408583979745]
A heavy-tail of rare events accounts for a significant amount of the total probability mass of distributions in language.
Standard language modeling metrics such as perplexity quantify the performance of language models (LM) in aggregate.
We develop a controlled evaluation scheme which uses generative models trained on natural data as artificial languages.
arXiv Detail & Related papers (2022-03-24T01:09:46Z) - Deep Probability Estimation [14.659180336823354]
We investigate probability estimation from high-dimensional data using deep neural networks.
The goal of this work is to investigate probability estimation from high-dimensional data using deep neural networks.
We evaluate existing methods on the synthetic data as well as on three real-world probability estimation tasks.
arXiv Detail & Related papers (2021-11-21T03:55:50Z) - Probabilistic Gradient Boosting Machines for Large-Scale Probabilistic
Regression [51.770998056563094]
Probabilistic Gradient Boosting Machines (PGBM) is a method to create probabilistic predictions with a single ensemble of decision trees.
We empirically demonstrate the advantages of PGBM compared to existing state-of-the-art methods.
arXiv Detail & Related papers (2021-06-03T08:32:13Z) - General stochastic separation theorems with optimal bounds [68.8204255655161]
Phenomenon of separability was revealed and used in machine learning to correct errors of Artificial Intelligence (AI) systems and analyze AI instabilities.
Errors or clusters of errors can be separated from the rest of the data.
The ability to correct an AI system also opens up the possibility of an attack on it, and the high dimensionality induces vulnerabilities caused by the same separability.
arXiv Detail & Related papers (2020-10-11T13:12:41Z) - Estimating Structural Target Functions using Machine Learning and
Influence Functions [103.47897241856603]
We propose a new framework for statistical machine learning of target functions arising as identifiable functionals from statistical models.
This framework is problem- and model-agnostic and can be used to estimate a broad variety of target parameters of interest in applied statistics.
We put particular focus on so-called coarsening at random/doubly robust problems with partially unobserved information.
arXiv Detail & Related papers (2020-08-14T16:48:29Z) - A Causal Direction Test for Heterogeneous Populations [10.653162005300608]
Most causal models assume a single homogeneous population, an assumption that may fail to hold in many applications.
We show that when the homogeneity assumption is violated, causal models developed based on such assumption can fail to identify the correct causal direction.
We propose an adjustment to a commonly used causal direction test statistic by using a $k$-means type clustering algorithm.
arXiv Detail & Related papers (2020-06-08T18:59:14Z) - Machine learning for causal inference: on the use of cross-fit
estimators [77.34726150561087]
Doubly-robust cross-fit estimators have been proposed to yield better statistical properties.
We conducted a simulation study to assess the performance of several estimators for the average causal effect (ACE)
When used with machine learning, the doubly-robust cross-fit estimators substantially outperformed all of the other estimators in terms of bias, variance, and confidence interval coverage.
arXiv Detail & Related papers (2020-04-21T23:09:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.