Causal Transformer for Estimating Counterfactual Outcomes
- URL: http://arxiv.org/abs/2204.07258v1
- Date: Thu, 14 Apr 2022 22:40:09 GMT
- Title: Causal Transformer for Estimating Counterfactual Outcomes
- Authors: Valentyn Melnychuk, Dennis Frauen, Stefan Feuerriegel
- Abstract summary: Estimating counterfactual outcomes over time from observational data is relevant for many applications.
We develop a novel Causal Transformer for estimating counterfactual outcomes over time.
Our model is specifically designed to capture complex, long-range dependencies among time-varying confounders.
- Score: 18.640006398066188
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Estimating counterfactual outcomes over time from observational data is
relevant for many applications (e.g., personalized medicine). Yet,
state-of-the-art methods build upon simple long short-term memory (LSTM)
networks, thus rendering inferences for complex, long-range dependencies
challenging. In this paper, we develop a novel Causal Transformer for
estimating counterfactual outcomes over time. Our model is specifically
designed to capture complex, long-range dependencies among time-varying
confounders. For this, we combine three transformer subnetworks with separate
inputs for time-varying covariates, previous treatments, and previous outcomes
into a joint network with in-between cross-attentions. We further develop a
custom, end-to-end training procedure for our Causal Transformer. Specifically,
we propose a novel counterfactual domain confusion loss to address confounding
bias: it aims to learn adversarial balanced representations, so that they are
predictive of the next outcome but non-predictive of the current treatment
assignment. We evaluate our Causal Transformer based on synthetic and
real-world datasets, where it achieves superior performance over current
baselines. To the best of our knowledge, this is the first work proposing
transformer-based architecture for estimating counterfactual outcomes from
longitudinal data.
Related papers
- Rough Transformers for Continuous and Efficient Time-Series Modelling [46.58170057001437]
Time-series data in real-world medical settings typically exhibit long-range dependencies and are observed at non-uniform intervals.
We introduce the Rough Transformer, a variation of the Transformer model which operates on continuous-time representations of input sequences.
We find that Rough Transformers consistently outperform their vanilla attention counterparts while obtaining the benefits of Neural ODE-based models.
arXiv Detail & Related papers (2024-03-15T13:29:45Z) - End-to-End Meta-Bayesian Optimisation with Transformer Neural Processes [52.818579746354665]
This paper proposes the first end-to-end differentiable meta-BO framework that generalises neural processes to learn acquisition functions via transformer architectures.
We enable this end-to-end framework with reinforcement learning (RL) to tackle the lack of labelled acquisition data.
arXiv Detail & Related papers (2023-05-25T10:58:46Z) - FormerTime: Hierarchical Multi-Scale Representations for Multivariate
Time Series Classification [53.55504611255664]
FormerTime is a hierarchical representation model for improving the classification capacity for the multivariate time series classification task.
It exhibits three aspects of merits: (1) learning hierarchical multi-scale representations from time series data, (2) inheriting the strength of both transformers and convolutional networks, and (3) tacking the efficiency challenges incurred by the self-attention mechanism.
arXiv Detail & Related papers (2023-02-20T07:46:14Z) - Towards Long-Term Time-Series Forecasting: Feature, Pattern, and
Distribution [57.71199089609161]
Long-term time-series forecasting (LTTF) has become a pressing demand in many applications, such as wind power supply planning.
Transformer models have been adopted to deliver high prediction capacity because of the high computational self-attention mechanism.
We propose an efficient Transformerbased model, named Conformer, which differentiates itself from existing methods for LTTF in three aspects.
arXiv Detail & Related papers (2023-01-05T13:59:29Z) - Robust representations of oil wells' intervals via sparse attention
mechanism [2.604557228169423]
We introduce the class of efficient Transformers named Regularized Transformers (Reguformers)
The focus in our experiments is on oil&gas data, namely, well logs.
To evaluate our models for such problems, we work with an industry-scale open dataset consisting of well logs of more than 20 wells.
arXiv Detail & Related papers (2022-12-29T09:56:33Z) - CLMFormer: Mitigating Data Redundancy to Revitalize Transformer-based
Long-Term Time Series Forecasting System [46.39662315849883]
Long-term time-series forecasting (LTSF) plays a crucial role in various practical applications.
Existing Transformer-based models, such as Fedformer and Informer, often achieve their best performances on validation sets after just a few epochs.
We propose a novel approach to address this issue by employing curriculum learning and introducing a memory-driven decoder.
arXiv Detail & Related papers (2022-07-16T04:05:15Z) - Are Transformers Effective for Time Series Forecasting? [13.268196448051308]
Recently, there has been a surge of Transformer-based solutions for the time series forecasting (TSF) task.
This study investigates whether Transformer-based techniques are the right solutions for long-term time series forecasting.
We find that the relatively higher long-term forecasting accuracy of Transformer-based solutions has little to do with the temporal relation extraction capabilities of the Transformer architecture.
arXiv Detail & Related papers (2022-05-26T17:17:08Z) - Transformer Hawkes Process [79.16290557505211]
We propose a Transformer Hawkes Process (THP) model, which leverages the self-attention mechanism to capture long-term dependencies.
THP outperforms existing models in terms of both likelihood and event prediction accuracy by a notable margin.
We provide a concrete example, where THP achieves improved prediction performance for learning multiple point processes when incorporating their relational information.
arXiv Detail & Related papers (2020-02-21T13:48:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.