CubeletWorld: A New Abstraction for Scalable 3D Modeling
- URL: http://arxiv.org/abs/2511.17664v1
- Date: Fri, 21 Nov 2025 00:19:02 GMT
- Title: CubeletWorld: A New Abstraction for Scalable 3D Modeling
- Authors: Azlaan Mustafa Samad, Hoang H. Nguyen, Lukas Berg, Henrik Müller, Yuan Xue, Daniel Kudenko, Zahra Ahmadi,
- Abstract summary: We introduce CubeletWorld, a framework for representing and analyzing urban environments through a discretized 3D grid of spatial units called cubelets.<n>This abstraction enables privacy-preserving modeling by embedding diverse data signals, such as infrastructure, movement, or environmental indicators, into localized cubelet states.<n>We propose the CubeletWorld State Prediction task, which involves predicting the cubelet state using a realistic dataset containing various urban elements.
- Score: 9.828459640939004
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Modern cities produce vast streams of heterogeneous data, from infrastructure maps to mobility logs and satellite imagery. However, integrating these sources into coherent spatial models for planning and prediction remains a major challenge. Existing agent-centric methods often rely on direct environmental sensing, limiting scalability and raising privacy concerns. This paper introduces CubeletWorld, a novel framework for representing and analyzing urban environments through a discretized 3D grid of spatial units called cubelets. This abstraction enables privacy-preserving modeling by embedding diverse data signals, such as infrastructure, movement, or environmental indicators, into localized cubelet states. CubeletWorld supports downstream tasks such as planning, navigation, and occupancy prediction without requiring agent-driven sensing. To evaluate this paradigm, we propose the CubeletWorld State Prediction task, which involves predicting the cubelet state using a realistic dataset containing various urban elements like streets and buildings through this discretized representation. We explore a range of modified core models suitable for our setting and analyze challenges posed by increasing spatial granularity, specifically the issue of sparsity in representation and scalability of baselines. In contrast to existing 3D occupancy prediction models, our cubelet-centric approach focuses on inferring state at the spatial unit level, enabling greater generalizability across regions and improved privacy compliance. Our results demonstrate that CubeletWorld offers a flexible and extensible framework for learning from complex urban data, and it opens up new possibilities for scalable simulation and decision support in domains such as socio-demographic modeling, environmental monitoring, and emergency response. The code and datasets can be downloaded from here.
Related papers
- A multi-view contrastive learning framework for spatial embeddings in risk modelling [0.688204255655161]
spatial data are often unstructured, high-dimensional, and difficult to integrate into predictive models.<n>We propose a novel multi-view contrastive learning framework for generating spatial embeddings.<n>In a case study on French real estate prices, we compare models trained on raw coordinates against those using our spatial embeddings as inputs.
arXiv Detail & Related papers (2025-11-22T07:39:34Z) - 3dSAGER: Geospatial Entity Resolution over 3D Objects (Technical Report) [7.378893412842889]
3dSAGER is an end-to-end pipeline for geospatial entity resolution over 3D objects.<n>We present a novel, spatial-reference-independent featurization mechanism that captures intricate geometric characteristics of matching pairs.<n>We also propose a new lightweight and interpretable blocking method, BKAFI, that leverages a trained model to efficiently generate high-recall candidate sets.
arXiv Detail & Related papers (2025-11-09T09:35:45Z) - Scaling Up Occupancy-centric Driving Scene Generation: Dataset and Method [54.461213497603154]
Occupancy-centric methods have recently achieved state-of-the-art results by offering consistent conditioning across frames and modalities.<n>Nuplan-Occ is the largest occupancy dataset to date, constructed from the widely used Nuplan benchmark.<n>We develop a unified framework that jointly synthesizes high-quality occupancy, multi-view videos, and LiDAR point clouds.
arXiv Detail & Related papers (2025-10-27T03:52:45Z) - R2RGEN: Real-to-Real 3D Data Generation for Spatially Generalized Manipulation [74.41728218960465]
We propose a real-to-real 3D data generation framework (R2RGen) that directly augments the pointcloud observation-action pairs to generate real-world data.<n>R2RGen substantially enhances data efficiency on extensive experiments and demonstrates strong potential for scaling and application on mobile manipulation.
arXiv Detail & Related papers (2025-10-09T17:55:44Z) - TGP: Two-modal occupancy prediction with 3D Gaussian and sparse points for 3D Environment Awareness [13.68631587423815]
3D semantic occupancy has rapidly become a research focus in the fields of robotics and autonomous driving environment perception.<n>Existing occupancy prediction tasks are modeled using voxel or point cloud-based approaches.<n>We propose a dual-modal prediction method based on 3D Gaussian sets and sparse points, which balances both spatial location and volumetric structural information.
arXiv Detail & Related papers (2025-03-13T01:35:04Z) - HGAurban: Heterogeneous Graph Autoencoding for Urban Spatial-Temporal Learning [36.80668790442231]
A key challenge lies in the noisy and sparse nature of spatial-temporal data, which limits existing neural networks' ability to learn meaningful region representations in the spatial-temporal graph.<n>We propose Hurban, a novel heterogeneous spatial-temporal graph masked autoencoder that leverages generative self-supervised learning for robust urban data representation.
arXiv Detail & Related papers (2024-10-14T07:33:33Z) - OPUS: Occupancy Prediction Using a Sparse Set [64.60854562502523]
We present a framework to simultaneously predict occupied locations and classes using a set of learnable queries.
OPUS incorporates a suite of non-trivial strategies to enhance model performance.
Our lightest model achieves superior RayIoU on the Occ3D-nuScenes dataset at near 2x FPS, while our heaviest model surpasses previous best results by 6.1 RayIoU.
arXiv Detail & Related papers (2024-09-14T07:44:22Z) - LiveHPS: LiDAR-based Scene-level Human Pose and Shape Estimation in Free
Environment [59.320414108383055]
We present LiveHPS, a novel single-LiDAR-based approach for scene-level human pose and shape estimation.
We propose a huge human motion dataset, named FreeMotion, which is collected in various scenarios with diverse human poses.
arXiv Detail & Related papers (2024-02-27T03:08:44Z) - LibCity: A Unified Library Towards Efficient and Comprehensive Urban
Spatial-Temporal Prediction [74.08181247675095]
There are limitations in the existing field, including open-source data being in various formats and difficult to use.
We propose LibCity, an open-source library that offers researchers a credible experimental tool and a convenient development framework.
arXiv Detail & Related papers (2023-04-27T17:19:26Z) - City-scale Incremental Neural Mapping with Three-layer Sampling and
Panoptic Representation [5.682979644056021]
We build a city-scale continual neural mapping system with a panoptic representation that consists of environment-level and instance-level modelling.
Given a stream of sparse LiDAR point cloud, it maintains a dynamic generative model that maps 3D coordinates to signed distance field (SDF) values.
To realize high fidelity mapping of instance under incomplete observation, category-specific prior is introduced to better model the geometric details.
arXiv Detail & Related papers (2022-09-28T13:14:40Z) - Conditioned Human Trajectory Prediction using Iterative Attention Blocks [70.36888514074022]
We present a simple yet effective pedestrian trajectory prediction model aimed at pedestrians positions prediction in urban-like environments.
Our model is a neural-based architecture that can run several layers of attention blocks and transformers in an iterative sequential fashion.
We show that without explicit introduction of social masks, dynamical models, social pooling layers, or complicated graph-like structures, it is possible to produce on par results with SoTA models.
arXiv Detail & Related papers (2022-06-29T07:49:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.