SPARTAN: A Sparse Transformer Learning Local Causation
- URL: http://arxiv.org/abs/2411.06890v2
- Date: Tue, 12 Nov 2024 09:12:42 GMT
- Title: SPARTAN: A Sparse Transformer Learning Local Causation
- Authors: Anson Lei, Bernhard Schölkopf, Ingmar Posner,
- Abstract summary: Causal structures play a central role in world models that flexibly adapt to changes in the environment.
We present the SPARse TrANsformer World model (SPARTAN), a Transformer-based world model that learns local causal structures between entities in a scene.
By applying sparsity regularisation on the attention pattern between object-factored tokens, SPARTAN identifies sparse local causal models that accurately predict future object states.
- Score: 63.29645501232935
- License:
- Abstract: Causal structures play a central role in world models that flexibly adapt to changes in the environment. While recent works motivate the benefits of discovering local causal graphs for dynamics modelling, in this work we demonstrate that accurately capturing these relationships in complex settings remains challenging for the current state-of-the-art. To remedy this shortcoming, we postulate that sparsity is a critical ingredient for the discovery of such local causal structures. To this end we present the SPARse TrANsformer World model (SPARTAN), a Transformer-based world model that learns local causal structures between entities in a scene. By applying sparsity regularisation on the attention pattern between object-factored tokens, SPARTAN identifies sparse local causal models that accurately predict future object states. Furthermore, we extend our model to capture sparse interventions with unknown targets on the dynamics of the environment. This results in a highly interpretable world model that can efficiently adapt to changes. Empirically, we evaluate SPARTAN against the current state-of-the-art in object-centric world models on observation-based environments and demonstrate that our model can learn accurate local causal graphs and achieve significantly improved few-shot adaptation to changes in the dynamics of the environment as well as robustness against removing irrelevant distractors.
Related papers
- DeCaf: A Causal Decoupling Framework for OOD Generalization on Node Classification [14.96980804513399]
Graph Neural Networks (GNNs) are susceptible to distribution shifts, creating vulnerability and security issues in critical domains.
Existing methods that target learning an invariant (feature, structure)-label mapping often depend on oversimplified assumptions about the data generation process.
We introduce a more realistic graph data generation model using Structural Causal Models (SCMs)
We propose a casual decoupling framework, DeCaf, that independently learns unbiased feature-label and structure-label mappings.
arXiv Detail & Related papers (2024-10-27T00:22:18Z) - Robust Traffic Forecasting against Spatial Shift over Years [11.208740750755025]
We investigate state-temporal-the-art models using newly proposed traffic OOD benchmarks.
We find that these models experience significant decline in performance.
We propose a novel of Mixture Experts framework, which learns a set of graph generators during training and combines them to generate new graphs.
Our method is both parsimonious and efficacious, and can be seamlessly integrated into anytemporal model.
arXiv Detail & Related papers (2024-10-01T03:49:29Z) - Partial Models for Building Adaptive Model-Based Reinforcement Learning Agents [37.604622216020765]
We show that the conceptually simple idea of partial models can allow deep model-based agents to overcome this challenge.
We demonstrate this by showing that the use of partial models in agents such as deep Dyna-Q, PlaNet and Dreamer can allow for them to effectively adapt to the local changes in their environments.
arXiv Detail & Related papers (2024-05-27T07:46:36Z) - LROC-PANGU-GAN: Closing the Simulation Gap in Learning Crater
Segmentation with Planetary Simulators [5.667566032625522]
It is critical for probes landing on foreign planetary bodies to be able to robustly identify and avoid hazards.
Recent applications of deep learning to this problem show promising results.
These models are, however, often learned with explicit supervision over annotated datasets.
This paper introduces a system to close this "realism" gap while retaining label fidelity.
arXiv Detail & Related papers (2023-10-04T12:52:38Z) - Global-to-Local Modeling for Video-based 3D Human Pose and Shape
Estimation [53.04781510348416]
Video-based 3D human pose and shape estimations are evaluated by intra-frame accuracy and inter-frame smoothness.
We propose to structurally decouple the modeling of long-term and short-term correlations in an end-to-end framework, Global-to-Local Transformer (GLoT)
Our GLoT surpasses previous state-of-the-art methods with the lowest model parameters on popular benchmarks, i.e., 3DPW, MPI-INF-3DHP, and Human3.6M.
arXiv Detail & Related papers (2023-03-26T14:57:49Z) - Towards Understanding and Mitigating Dimensional Collapse in Heterogeneous Federated Learning [112.69497636932955]
Federated learning aims to train models across different clients without the sharing of data for privacy considerations.
We study how data heterogeneity affects the representations of the globally aggregated models.
We propose sc FedDecorr, a novel method that can effectively mitigate dimensional collapse in federated learning.
arXiv Detail & Related papers (2022-10-01T09:04:17Z) - Bridging the Gap to Real-World Object-Centric Learning [66.55867830853803]
We show that reconstructing features from models trained in a self-supervised manner is a sufficient training signal for object-centric representations to arise in a fully unsupervised way.
Our approach, DINOSAUR, significantly out-performs existing object-centric learning models on simulated data.
arXiv Detail & Related papers (2022-09-29T15:24:47Z) - Change Detection for Local Explainability in Evolving Data Streams [72.4816340552763]
Local feature attribution methods have become a popular technique for post-hoc and model-agnostic explanations.
It is often unclear how local attributions behave in realistic, constantly evolving settings such as streaming and online applications.
We present CDLEEDS, a flexible and model-agnostic framework for detecting local change and concept drift.
arXiv Detail & Related papers (2022-09-06T18:38:34Z) - Variational Causal Dynamics: Discovering Modular World Models from
Interventions [25.084146613277973]
Latent world models allow agents to reason about complex environments with high-dimensional observations.
We present variational causal dynamics (VCD), a structured world model that exploits the invariance of causal mechanisms across environments.
arXiv Detail & Related papers (2022-06-22T14:28:40Z) - Contrastive Neighborhood Alignment [81.65103777329874]
We present Contrastive Neighborhood Alignment (CNA), a manifold learning approach to maintain the topology of learned features.
The target model aims to mimic the local structure of the source representation space using a contrastive loss.
CNA is illustrated in three scenarios: manifold learning, where the model maintains the local topology of the original data in a dimension-reduced space; model distillation, where a small student model is trained to mimic a larger teacher; and legacy model update, where an older model is replaced by a more powerful one.
arXiv Detail & Related papers (2022-01-06T04:58:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.