FACTS: A Factored State-Space Framework For World Modelling
- URL: http://arxiv.org/abs/2410.20922v1
- Date: Mon, 28 Oct 2024 11:04:42 GMT
- Title: FACTS: A Factored State-Space Framework For World Modelling
- Authors: Li Nanbo, Firas Laakom, Yucheng Xu, Wenyi Wang, Jürgen Schmidhuber,
- Abstract summary: We propose a novel recurrent framework, the textbfFACTored textbfState-space (textbfFACTS) model, for spatial-temporal world modelling.
The FACTS framework constructs a graph-memory with a routing mechanism that learns permutable memory representations.
It consistently outperforms or matches specialised state-of-the-art models, despite its general-purpose world modelling design.
- Score: 24.08175276756845
- License:
- Abstract: World modelling is essential for understanding and predicting the dynamics of complex systems by learning both spatial and temporal dependencies. However, current frameworks, such as Transformers and selective state-space models like Mambas, exhibit limitations in efficiently encoding spatial and temporal structures, particularly in scenarios requiring long-term high-dimensional sequence modelling. To address these issues, we propose a novel recurrent framework, the \textbf{FACT}ored \textbf{S}tate-space (\textbf{FACTS}) model, for spatial-temporal world modelling. The FACTS framework constructs a graph-structured memory with a routing mechanism that learns permutable memory representations, ensuring invariance to input permutations while adapting through selective state-space propagation. Furthermore, FACTS supports parallel computation of high-dimensional sequences. We empirically evaluate FACTS across diverse tasks, including multivariate time series forecasting and object-centric world modelling, demonstrating that it consistently outperforms or matches specialised state-of-the-art models, despite its general-purpose world modelling design.
Related papers
- PoseMamba: Monocular 3D Human Pose Estimation with Bidirectional Global-Local Spatio-Temporal State Space Model [7.286873011001679]
We propose a purely SSM-based approach with linear correlations for complexityD human pose estimation in monocular video video.
Specifically, we propose a bidirectional global temporal-local-temporal block that comprehensively models human joint relations within individual frames as well as across frames.
This strategy provides a more logical geometric ordering strategy, resulting in a combined-local spatial scan.
arXiv Detail & Related papers (2024-08-07T04:38:03Z) - SHMamba: Structured Hyperbolic State Space Model for Audio-Visual Question Answering [5.016335384639901]
Multi-modal input of Audio-Visual Question Answering (AVQA) makes feature extraction and fusion processes more challenging.
We propose SHMamba: Structured Hyperbolic State Space Model to integrate the advantages of hyperbolic geometry and state space models.
Our method demonstrates superiority among all current major methods and is more suitable for practical application scenarios.
arXiv Detail & Related papers (2024-06-14T08:43:31Z) - Attractor Memory for Long-Term Time Series Forecasting: A Chaos Perspective [63.60312929416228]
textbftextitAttraos incorporates chaos theory into long-term time series forecasting.
We show that Attraos outperforms various LTSF methods on mainstream datasets and chaotic datasets with only one-twelfth of the parameters compared to PatchTST.
arXiv Detail & Related papers (2024-02-18T05:35:01Z) - A Decoupled Spatio-Temporal Framework for Skeleton-based Action
Segmentation [89.86345494602642]
Existing methods are limited in weak-temporal modeling capability.
We propose a Decoupled Scoupled Framework (DeST) to address the issues.
DeST significantly outperforms current state-of-the-art methods with less computational complexity.
arXiv Detail & Related papers (2023-12-10T09:11:39Z) - Global-to-Local Modeling for Video-based 3D Human Pose and Shape
Estimation [53.04781510348416]
Video-based 3D human pose and shape estimations are evaluated by intra-frame accuracy and inter-frame smoothness.
We propose to structurally decouple the modeling of long-term and short-term correlations in an end-to-end framework, Global-to-Local Transformer (GLoT)
Our GLoT surpasses previous state-of-the-art methods with the lowest model parameters on popular benchmarks, i.e., 3DPW, MPI-INF-3DHP, and Human3.6M.
arXiv Detail & Related papers (2023-03-26T14:57:49Z) - Revisiting the Spatial and Temporal Modeling for Few-shot Action
Recognition [16.287968292213563]
We propose SloshNet, a new framework that revisits the spatial and temporal modeling for few-shot action recognition in a finer manner.
We extensively validate the proposed SloshNet on four few-shot action recognition datasets, including Something-Something V2, Kinetics, UCF101, and HMDB51.
arXiv Detail & Related papers (2023-01-19T08:34:04Z) - Hidden Parameter Recurrent State Space Models For Changing Dynamics
Scenarios [18.08665164701404]
Recurrent State-space models assume that the dynamics are fixed and unchanging, which is rarely the case in real-world scenarios.
We introduce the Hidden Recurrent State Space Models (HiP- RSSMs), a framework that parametrizes a family of related dynamical systems with a low-dimensional set of latent factors.
We show that HiP- RSSMs outperforms RSSMs and competing multi-task models on several challenging robotic benchmarks both on real-world systems and simulations.
arXiv Detail & Related papers (2022-06-29T14:54:49Z) - Low-Rank Constraints for Fast Inference in Structured Models [110.38427965904266]
This work demonstrates a simple approach to reduce the computational and memory complexity of a large class of structured models.
Experiments with neural parameterized structured models for language modeling, polyphonic music modeling, unsupervised grammar induction, and video modeling show that our approach matches the accuracy of standard models at large state spaces.
arXiv Detail & Related papers (2022-01-08T00:47:50Z) - Temporal Predictive Coding For Model-Based Planning In Latent Space [80.99554006174093]
We present an information-theoretic approach that employs temporal predictive coding to encode elements in the environment that can be predicted across time.
We evaluate our model on a challenging modification of standard DMControl tasks where the background is replaced with natural videos that contain complex but irrelevant information to the planning task.
arXiv Detail & Related papers (2021-06-14T04:31:15Z) - S2RMs: Spatially Structured Recurrent Modules [105.0377129434636]
We take a step towards exploiting dynamic structure that are capable of simultaneously exploiting both modular andtemporal structures.
We find our models to be robust to the number of available views and better capable of generalization to novel tasks without additional training.
arXiv Detail & Related papers (2020-07-13T17:44:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.