Using Deep Autoregressive Models as Causal Inference Engines
- URL: http://arxiv.org/abs/2409.18581v2
- Date: Sun, 6 Oct 2024 14:22:14 GMT
- Title: Using Deep Autoregressive Models as Causal Inference Engines
- Authors: Daniel Jiwoong Im, Kevin Zhang, Nakul Verma, Kyunghyun Cho,
- Abstract summary: We propose an autoregressive (AR) causal inference framework capable of handling complex confounders and sequential actions common in modern applications.
We accomplish this by em sequencification, transforming data from an underlying causal diagram into a sequence of tokens.
We demonstrate that an AR model adapted for CI is efficient and effective in various complex applications such as navigating mazes, playing chess endgames, and evaluating the impact of certain keywords on paper acceptance rates.
- Score: 38.26602521505842
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing causal inference (CI) models are limited to primarily handling low-dimensional confounders and singleton actions. We propose an autoregressive (AR) CI framework capable of handling complex confounders and sequential actions common in modern applications. We accomplish this by {\em sequencification}, transforming data from an underlying causal diagram into a sequence of tokens. This approach not only enables training with data generated from any DAG but also extends existing CI capabilities to accommodate estimating several statistical quantities using a {\em single} model. We can directly predict interventional probabilities, simplifying inference and enhancing outcome prediction accuracy. We demonstrate that an AR model adapted for CI is efficient and effective in various complex applications such as navigating mazes, playing chess endgames, and evaluating the impact of certain keywords on paper acceptance rates.
Related papers
- Deriving Causal Order from Single-Variable Interventions: Guarantees & Algorithm [14.980926991441345]
We show that datasets containing interventional data can be effectively extracted under realistic assumptions about the data distribution.
We introduce interventional faithfulness, which relies on comparisons between the marginal distributions of each variable across observational and interventional settings.
We also introduce Intersort, an algorithm designed to infer the causal order from datasets containing large numbers of single-variable interventions.
arXiv Detail & Related papers (2024-05-28T16:07:17Z) - Enhancing Few-shot NER with Prompt Ordering based Data Augmentation [59.69108119752584]
We propose a Prompt Ordering based Data Augmentation (PODA) method to improve the training of unified autoregressive generation frameworks.
Experimental results on three public NER datasets and further analyses demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2023-05-19T16:25:43Z) - Variable Importance Matching for Causal Inference [73.25504313552516]
We describe a general framework called Model-to-Match that achieves these goals.
Model-to-Match uses variable importance measurements to construct a distance metric.
We operationalize the Model-to-Match framework with LASSO.
arXiv Detail & Related papers (2023-02-23T00:43:03Z) - Conditioned Human Trajectory Prediction using Iterative Attention Blocks [70.36888514074022]
We present a simple yet effective pedestrian trajectory prediction model aimed at pedestrians positions prediction in urban-like environments.
Our model is a neural-based architecture that can run several layers of attention blocks and transformers in an iterative sequential fashion.
We show that without explicit introduction of social masks, dynamical models, social pooling layers, or complicated graph-like structures, it is possible to produce on par results with SoTA models.
arXiv Detail & Related papers (2022-06-29T07:49:48Z) - Factorized Structured Regression for Large-Scale Varying Coefficient
Models [1.3282354370017082]
We propose Factorized Structured Regression (FaStR) for scalable varying coefficient models.
FaStR overcomes limitations of general regression models for large-scale data by combining structured additive regression and factorization approaches in a neural network-based model implementation.
Empirical results confirm that the estimation of varying coefficients of our approach is on par with state-of-the-art regression techniques.
arXiv Detail & Related papers (2022-05-25T23:12:13Z) - Identifying and Mitigating Spurious Correlations for Improving
Robustness in NLP Models [19.21465581259624]
Many problems can be attributed to models exploiting spurious correlations, or shortcuts between the training data and the task labels.
In this paper, we aim to automatically identify such spurious correlations in NLP models at scale.
We show that our proposed method can effectively and efficiently identify a scalable set of "shortcuts", and mitigating these leads to more robust models in multiple applications.
arXiv Detail & Related papers (2021-10-14T21:40:03Z) - Click-through Rate Prediction with Auto-Quantized Contrastive Learning [46.585376453464114]
We consider whether the user behaviors are rich enough to capture the interests for prediction, and propose an Auto-Quantized Contrastive Learning (AQCL) loss to regularize the model.
The proposed framework is agnostic to different model architectures and can be trained in an end-to-end fashion.
arXiv Detail & Related papers (2021-09-27T04:39:43Z) - Autoregressive Score Matching [113.4502004812927]
We propose autoregressive conditional score models (AR-CSM) where we parameterize the joint distribution in terms of the derivatives of univariable log-conditionals (scores)
For AR-CSM models, this divergence between data and model distributions can be computed and optimized efficiently, requiring no expensive sampling or adversarial training.
We show with extensive experimental results that it can be applied to density estimation on synthetic data, image generation, image denoising, and training latent variable models with implicit encoders.
arXiv Detail & Related papers (2020-10-24T07:01:24Z) - Document Ranking with a Pretrained Sequence-to-Sequence Model [56.44269917346376]
We show how a sequence-to-sequence model can be trained to generate relevance labels as "target words"
Our approach significantly outperforms an encoder-only model in a data-poor regime.
arXiv Detail & Related papers (2020-03-14T22:29:50Z) - Learning Multivariate Hawkes Processes at Scale [17.17906360554892]
We show that our approach allows to compute the exact likelihood and gradients of an MHP -- independently of the ambient dimensions of the underlying network.
We show on synthetic and real-world datasets that our model does not only achieve state-of-the-art predictive results, but also improves runtime performance by multiple orders of magnitude.
arXiv Detail & Related papers (2020-02-28T01:18:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.