Causally-Guided Pairwise Transformer -- Towards Foundational Digital Twins in Process Industry
- URL: http://arxiv.org/abs/2508.13111v2
- Date: Mon, 22 Sep 2025 07:58:26 GMT
- Title: Causally-Guided Pairwise Transformer -- Towards Foundational Digital Twins in Process Industry
- Authors: Michael Mayr, Georgios C. Chasparis,
- Abstract summary: Causally-Guided Pairwise Transformer (CGPT) is a novel architecture that integrates a known causal graph as an inductive bias.<n>We show that CGPT significantly outperforms both channel-dependent (CD) and channel-independent (CI) models in predictive accuracy.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Foundational modelling of multi-dimensional time-series data in industrial systems presents a central trade-off: channel-dependent (CD) models capture specific cross-variable dynamics but lack robustness and adaptability as model layers are commonly bound to the data dimensionality of the tackled use-case, while channel-independent (CI) models offer generality at the cost of modelling the explicit interactions crucial for system-level predictive regression tasks. To resolve this, we propose the Causally-Guided Pairwise Transformer (CGPT), a novel architecture that integrates a known causal graph as an inductive bias. The core of CGPT is built around a pairwise modeling paradigm, tackling the CD/CI conflict by decomposing the multidimensional data into pairs. The model uses channel-agnostic learnable layers where all parameter dimensions are independent of the number of variables. CGPT enforces a CD information flow at the pair-level and CI-like generalization across pairs. This approach disentangles complex system dynamics and results in a highly flexible architecture that ensures scalability and any-variate adaptability. We validate CGPT on a suite of synthetic and real-world industrial datasets on long-term and one-step forecasting tasks designed to simulate common industrial complexities. Results demonstrate that CGPT significantly outperforms both CI and CD baselines in predictive accuracy and shows competitive performance with end-to-end trained CD models while remaining agnostic to the problem dimensionality.
Related papers
- Hierarchical Inference and Closure Learning via Adaptive Surrogates for ODEs and PDEs [15.38864225184245]
Inverse problems are the task of calibrating models to match data.<n>We develop a principled methodology for leveraging data from collections of distinct yet related physical systems.<n>We learn the shared unknown dynamics in the form of an ML-based closure model.
arXiv Detail & Related papers (2026-03-04T10:30:08Z) - Heterogeneous Model Alignment in Digital Twin [0.0]
Key challenge in model-driven DTs is aligning heterogeneous models across abstraction layers.<n>Existing methods, relying on static mappings and manual updates, are often inflexible, error-prone, and risk compromising data integrity.<n>We present a heterogeneous model alignment approach for multi-layered, model-driven DTs.
arXiv Detail & Related papers (2025-12-17T10:36:55Z) - Adapformer: Adaptive Channel Management for Multivariate Time Series Forecasting [49.40321003932633]
Adapformer is an advanced Transformer-based framework that merges the benefits of CI and CD methodologies through effective channel management.<n>Adapformer achieves superior performance over existing models, enhancing both predictive accuracy and computational efficiency.
arXiv Detail & Related papers (2025-11-18T16:24:05Z) - Learning Time-Aware Causal Representation for Model Generalization in Evolving Domains [50.66049136093248]
We develop a time-aware structural causal model (SCM) that incorporates dynamic causal factors and the causal mechanism drifts.<n>We show that our method can yield the optimal causal predictor for each time domain.<n>Results on both synthetic and real-world datasets exhibit that SYNC can achieve superior temporal generalization performance.
arXiv Detail & Related papers (2025-06-21T14:05:37Z) - A Theoretical Perspective: How to Prevent Model Collapse in Self-consuming Training Loops [55.07063067759609]
High-quality data is essential for training large generative models, yet the vast reservoir of real data available online has become nearly depleted.<n>Models increasingly generate their own data for further training, forming Self-consuming Training Loops (STLs)<n>Some models degrade or even collapse, while others successfully avoid these failures, leaving a significant gap in theoretical understanding.
arXiv Detail & Related papers (2025-02-26T06:18:13Z) - Causal Time-Series Synchronization for Multi-Dimensional Forecasting [1.1060425537315088]
The process industry's high expectations for Digital Twins require modeling approaches that can generalize across tasks and diverse domains.
Our approach focuses on: (i) identifying highly lagged causal relationships using data-driven methods, (ii) synchronizing cause-effect pairs to generate training samples for channel-dependent pre-training, and (iii) evaluating the effectiveness of this approach in channel-dependent forecasting.
arXiv Detail & Related papers (2024-11-15T12:50:57Z) - Latent Semantic Consensus For Deterministic Geometric Model Fitting [109.44565542031384]
We propose an effective method called Latent Semantic Consensus (LSC)
LSC formulates the model fitting problem into two latent semantic spaces based on data points and model hypotheses.
LSC is able to provide consistent and reliable solutions within only a few milliseconds for general multi-structural model fitting.
arXiv Detail & Related papers (2024-03-11T05:35:38Z) - Beyond DAGs: A Latent Partial Causal Model for Multimodal Learning [80.44084021062105]
We propose a novel latent partial causal model for multimodal data, featuring two latent coupled variables, connected by an undirected edge, to represent the transfer of knowledge across modalities.<n>Under specific statistical assumptions, we establish an identifiability result, demonstrating that representations learned by multimodal contrastive learning correspond to the latent coupled variables up to a trivial transformation.<n>Experiments on a pre-trained CLIP model embodies disentangled representations, enabling few-shot learning and improving domain generalization across diverse real-world datasets.
arXiv Detail & Related papers (2024-02-09T07:18:06Z) - PGODE: Towards High-quality System Dynamics Modeling [40.76121531452706]
This paper studies the problem of modeling multi-agent dynamical systems, where agents could interact mutually to influence their behaviors.
Recent research predominantly uses geometric graphs to depict these mutual interactions, which are then captured by graph neural networks (GNNs)
We propose a new approach named Prototypical Graph ODE to address the problem.
arXiv Detail & Related papers (2023-11-11T12:04:47Z) - Enhanced multi-fidelity modelling for digital twin and uncertainty
quantification [0.0]
Data-driven models play a crucial role in digital twins, enabling real-time updates and predictions.
The fidelity of available data and the scarcity of accurate sensor data often hinder the efficient learning of surrogate models.
We propose a novel framework that begins by developing a robust multi-fidelity surrogate model.
arXiv Detail & Related papers (2023-06-26T05:58:17Z) - Bayesian Complementary Kernelized Learning for Multidimensional
Spatiotemporal Data [11.763229353978321]
We propose a new statistical framework -- Complementary Complementary Kernelized Learning (BCKL)
BCKL offers superior performance in providing accurate posterior mean and high-quality uncertainty estimates.
arXiv Detail & Related papers (2022-08-21T22:38:54Z) - Towards Robust Unsupervised Disentanglement of Sequential Data -- A Case
Study Using Music Audio [17.214062755082065]
Disentangled sequential autoencoders (DSAEs) represent a class of probabilistic graphical models.
We show that the vanilla DSAE suffers from being sensitive to the choice of model architecture and capacity of the dynamic latent variables.
We propose TS-DSAE, a two-stage training framework that first learns sequence-level prior distributions.
arXiv Detail & Related papers (2022-05-12T04:11:25Z) - Understanding Overparameterization in Generative Adversarial Networks [56.57403335510056]
Generative Adversarial Networks (GANs) are used to train non- concave mini-max optimization problems.
A theory has shown the importance of the gradient descent (GD) to globally optimal solutions.
We show that in an overized GAN with a $1$-layer neural network generator and a linear discriminator, the GDA converges to a global saddle point of the underlying non- concave min-max problem.
arXiv Detail & Related papers (2021-04-12T16:23:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.