Learning Abstract World Models with a Group-Structured Latent Space
- URL: http://arxiv.org/abs/2506.01529v1
- Date: Mon, 02 Jun 2025 10:43:18 GMT
- Title: Learning Abstract World Models with a Group-Structured Latent Space
- Authors: Thomas Delliaux, Nguyen-Khanh Vu, Vincent François-Lavet, Elise van der Pol, Emmanuel Rachelson,
- Abstract summary: We show how geometric priors can be imposed on the representation manifold of a learned transition model.<n>We show experimentally that this leads to better predictions of the latent transition model than fully unstructured approaches.
- Score: 12.685414866379366
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learning meaningful abstract models of Markov Decision Processes (MDPs) is crucial for improving generalization from limited data. In this work, we show how geometric priors can be imposed on the low-dimensional representation manifold of a learned transition model. We incorporate known symmetric structures via appropriate choices of the latent space and the associated group actions, which encode prior knowledge about invariances in the environment. In addition, our framework allows the embedding of additional unstructured information alongside these symmetries. We show experimentally that this leads to better predictions of the latent transition model than fully unstructured approaches, as well as better learning on downstream RL tasks, in environments with rotational and translational features, including in first-person views of 3D environments. Additionally, our experiments show that this leads to simpler and more disentangled representations. The full code is available on GitHub to ensure reproducibility.
Related papers
- Rigidity-Aware Geometric Pretraining for Protein Design and Conformational Ensembles [74.32932832937618]
We introduce $textbfRigidSSL$ ($textitRigidity-Aware Self-Supervised Learning$), a geometric pretraining framework.<n>Phase I (RigidSSL-Perturb) learns geometric priors from 432K structures from the AlphaFold Protein Structure Database with simulated perturbations.<n>Phase II (RigidSSL-MD) refines these representations on 1.3K molecular dynamics trajectories to capture physically realistic transitions.
arXiv Detail & Related papers (2026-03-02T21:32:30Z) - Geometric Priors for Generalizable World Models via Vector Symbolic Architecture [9.216794073296679]
Key challenge in artificial intelligence is understanding how neural systems learn representations that capture the underlying dynamics of the world.<n>We address these issues with a generalizable world model grounded in Vector Symbolic Architecture (VSA) principles as geometric priors.<n>We show how training to have latent group structure yields generalizable, data-efficient, and interpretable world models.
arXiv Detail & Related papers (2026-02-25T00:41:42Z) - Understanding and Improving Length Generalization in Hierarchical Sparse Attention Models [33.99822400076112]
Chunk-based sparse attention has emerged as a promising paradigm for extreme length generalization.<n>We present a systematic dissection of these models to identify the core components driving their performance.<n>We establish a new state-of-the-art for training-free length extrapolation, successfully generalizing models trained on a 4K context to 32 million tokens on RULER and BABILong.
arXiv Detail & Related papers (2025-10-20T06:17:57Z) - A Markov Categorical Framework for Language Modeling [5.980612601840882]
Auto-regressive language models factorize sequence probabilities and are trained by minimizing the negative log-likelihood (NLL) objective.<n>This work introduces a unifying analytical framework using Markov Categories (MCs) to deconstruct the AR generation process and the NLL objective.<n>By analyzing the information geometry of the model's prediction head, we show that NLL implicitly forces the learned representation space to align with the eigenspectrum of a similarity predictive operator.
arXiv Detail & Related papers (2025-07-25T13:14:03Z) - Topology-Aware Modeling for Unsupervised Simulation-to-Reality Point Cloud Recognition [63.55828203989405]
We introduce a novel Topology-Aware Modeling (TAM) framework for Sim2Real UDA on object point clouds.<n>Our approach mitigates the domain gap by leveraging global spatial topology, characterized by low-level, high-frequency 3D structures.<n>We propose an advanced self-training strategy that combines cross-domain contrastive learning with self-training.
arXiv Detail & Related papers (2025-06-26T11:53:59Z) - In-context learning of evolving data streams with tabular foundational models [42.13420474990124]
This work bridges advancements from both areas, highlighting how transformers' implicit meta-learning abilities, pre-training on drifting natural data, and reliance on context optimization directly address the core challenges of adaptive learning in dynamic environments.<n> Exploring real-time model adaptation, this research demonstrates that TabPFN, coupled with a simple sliding memory strategy, consistently outperforms ensembles of Hoeffding trees across all non-stationary benchmarks.
arXiv Detail & Related papers (2025-02-24T04:52:35Z) - Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding [50.448520056844885]
We propose a generative Bayesian network to produce diverse synthetic scenes with real-world patterns.
A series of experiments robustly display our method's consistent superiority over existing state-of-the-art pre-training approaches.
arXiv Detail & Related papers (2024-06-17T07:43:53Z) - Self-Supervised Representation Learning with Meta Comprehensive
Regularization [11.387994024747842]
We introduce a module called CompMod with Meta Comprehensive Regularization (MCR), embedded into existing self-supervised frameworks.
We update our proposed model through a bi-level optimization mechanism, enabling it to capture comprehensive features.
We provide theoretical support for our proposed method from information theory and causal counterfactual perspective.
arXiv Detail & Related papers (2024-03-03T15:53:48Z) - Contextualization Distillation from Large Language Model for Knowledge
Graph Completion [51.126166442122546]
We introduce the Contextualization Distillation strategy, a plug-in-and-play approach compatible with both discriminative and generative KGC frameworks.
Our method begins by instructing large language models to transform compact, structural triplets into context-rich segments.
Comprehensive evaluations across diverse datasets and KGC techniques highlight the efficacy and adaptability of our approach.
arXiv Detail & Related papers (2024-01-28T08:56:49Z) - Human as Points: Explicit Point-based 3D Human Reconstruction from Single-view RGB Images [71.91424164693422]
We introduce an explicit point-based human reconstruction framework called HaP.<n>Our approach is featured by fully-explicit point cloud estimation, manipulation, generation, and refinement in the 3D geometric space.<n>Our results may indicate a paradigm rollback to the fully-explicit and geometry-centric algorithm design.
arXiv Detail & Related papers (2023-11-06T05:52:29Z) - A Geometric Perspective on Variational Autoencoders [0.0]
This paper introduces a new interpretation of the Variational Autoencoder framework by taking a fully geometric point of view.
We show that using this scheme can make a vanilla VAE competitive and even better than more advanced versions on several benchmark datasets.
arXiv Detail & Related papers (2022-09-15T15:32:43Z) - GSMFlow: Generation Shifts Mitigating Flow for Generalized Zero-Shot
Learning [55.79997930181418]
Generalized Zero-Shot Learning aims to recognize images from both the seen and unseen classes by transferring semantic knowledge from seen to unseen classes.
It is a promising solution to take the advantage of generative models to hallucinate realistic unseen samples based on the knowledge learned from the seen classes.
We propose a novel flow-based generative framework that consists of multiple conditional affine coupling layers for learning unseen data generation.
arXiv Detail & Related papers (2022-07-05T04:04:37Z) - Leveraging Equivariant Features for Absolute Pose Regression [9.30597356471664]
We show that a translation and rotation equivariant Convolutional Neural Network directly induces representations of camera motions into the feature space.
We then show that this geometric property allows for implicitly augmenting the training data under a whole group of image plane-preserving transformations.
arXiv Detail & Related papers (2022-04-05T12:44:20Z) - Learning Augmentation Distributions using Transformed Risk Minimization [47.236227685707526]
We propose a new emphTransformed Risk Minimization (TRM) framework as an extension of classical risk minimization.
As a key application, we focus on learning augmentations to improve classification performance with a given class of predictors.
arXiv Detail & Related papers (2021-11-16T02:07:20Z) - GELATO: Geometrically Enriched Latent Model for Offline Reinforcement
Learning [54.291331971813364]
offline reinforcement learning approaches can be divided into proximal and uncertainty-aware methods.
In this work, we demonstrate the benefit of combining the two in a latent variational model.
Our proposed metrics measure both the quality of out of distribution samples as well as the discrepancy of examples in the data.
arXiv Detail & Related papers (2021-02-22T19:42:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.