Towards foundational LiDAR world models with efficient latent flow matching
- URL: http://arxiv.org/abs/2506.23434v1
- Date: Mon, 30 Jun 2025 00:16:55 GMT
- Title: Towards foundational LiDAR world models with efficient latent flow matching
- Authors: Tianran Liu, Shengwen Zhao, Nicholas Rhinehart,
- Abstract summary: Existing LiDAR world models are narrowly trained; each model excels only in the domain for which it was built.<n>We conduct the first systematic domain transfer study across three demanding scenarios.<n>Given different amounts of fine-tuning data, our experiments show that a single pre-trained model can achieve up to 11% absolute improvement.
- Score: 9.86884512471034
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: LiDAR-based world models offer more structured and geometry-aware representations than their image-based counterparts. However, existing LiDAR world models are narrowly trained; each model excels only in the domain for which it was built. Can we develop LiDAR world models that exhibit strong transferability across multiple domains? We conduct the first systematic domain transfer study across three demanding scenarios: (i) outdoor to indoor generalization, (ii) sparse-beam \& dense-beam adaptation, and (iii) non-semantic to semantic transfer. Given different amounts of fine-tuning data, our experiments show that a single pre-trained model can achieve up to 11% absolute improvement (83\% relative) over training from scratch and outperforms training from scratch in 30/36 of our comparisons. This transferability of dynamic learning significantly reduces the reliance on manually annotated data for semantic occupancy forecasting: our method exceed the previous semantic occupancy forecasting models with only 5% of the labeled training data required by prior models. We also observed inefficiencies of current LiDAR world models, mainly through their under-compression of LiDAR data and inefficient training objectives. To address this, we propose a latent conditional flow matching (CFM)-based frameworks that achieves state-of-the-art reconstruction accuracy using only half the training data and a compression ratio 6 times higher than that of prior methods. Our model achieves SOTA performance on future-trajectory-conditioned semantic occupancy forecasting while being 23x more computationally efficient (a 28x FPS speedup); and achieves SOTA performance on semantic occupancy forecasting while being 2x more computationally efficient (a 1.1x FPS speedup).
Related papers
- Enhancing Training Data Attribution with Representational Optimization [57.61977909113113]
Training data attribution methods aim to measure how training data impacts a model's predictions.<n>We propose AirRep, a representation-based approach that closes this gap by learning task-specific and model-aligned representations explicitly for TDA.<n>AirRep introduces two key innovations: a trainable encoder tuned for attribution quality, and an attention-based pooling mechanism that enables accurate estimation of group-wise influence.
arXiv Detail & Related papers (2025-05-24T05:17:53Z) - Efficient Self-Supervised Learning for Earth Observation via Dynamic Dataset Curation [67.23953699167274]
Self-supervised learning (SSL) has enabled the development of vision foundation models for Earth Observation (EO)<n>In EO, this challenge is amplified by the redundancy and heavy-tailed distributions common in satellite imagery.<n>We propose a dynamic dataset pruning strategy designed to improve SSL pre-training by maximizing dataset diversity and balance.
arXiv Detail & Related papers (2025-04-09T15:13:26Z) - Beyond Scaling: Measuring and Predicting the Upper Bound of Knowledge Retention in Language Model Pre-Training [51.41246396610475]
This paper aims to predict performance in closed-book question answering (QA) without the help of external tools.<n>We conduct large-scale retrieval and semantic analysis across the pre-training corpora of 21 publicly available and 3 custom-trained large language models.<n>Building on these foundations, we propose Size-dependent Mutual Information (SMI), an information-theoretic metric that linearly correlates pre-training data characteristics.
arXiv Detail & Related papers (2025-02-06T13:23:53Z) - An Efficient Occupancy World Model via Decoupled Dynamic Flow and Image-assisted Training [50.71892161377806]
DFIT-OccWorld is an efficient 3D occupancy world model that leverages decoupled dynamic flow and image-assisted training strategy.<n>Our model forecasts future dynamic voxels by warping existing observations using voxel flow, whereas static voxels are easily obtained through pose transformation.
arXiv Detail & Related papers (2024-12-18T12:10:33Z) - Self-Supervised Radio Pre-training: Toward Foundational Models for Spectrogram Learning [6.1339395157466425]
Foundational deep learning (DL) models are general models, trained on diverse, diverse, and unlabelled datasets.
We introduce Masked Spectrogram Modeling, a novel self-supervised learning approach for pretraining foundational DL models on radio signals.
arXiv Detail & Related papers (2024-11-14T23:56:57Z) - DOME: Taming Diffusion Model into High-Fidelity Controllable Occupancy World Model [14.996395953240699]
DOME is a diffusion-based world model that predicts future occupancy frames based on past occupancy observations.
The ability of this world model to capture the evolution of the environment is crucial for planning in autonomous driving.
arXiv Detail & Related papers (2024-10-14T12:24:32Z) - Accelerating Large Language Model Pretraining via LFR Pedagogy: Learn, Focus, and Review [50.78587571704713]
Learn-Focus-Review (LFR) is a dynamic training approach that adapts to the model's learning progress.<n>LFR tracks the model's learning performance across data blocks (sequences of tokens) and prioritizes revisiting challenging regions of the dataset.<n>Compared to baseline models trained on the full datasets, LFR consistently achieved lower perplexity and higher accuracy.
arXiv Detail & Related papers (2024-09-10T00:59:18Z) - Federated Learning with Projected Trajectory Regularization [65.6266768678291]
Federated learning enables joint training of machine learning models from distributed clients without sharing their local data.
One key challenge in federated learning is to handle non-identically distributed data across the clients.
We propose a novel federated learning framework with projected trajectory regularization (FedPTR) for tackling the data issue.
arXiv Detail & Related papers (2023-12-22T02:12:08Z) - Foundation Models for Generalist Geospatial Artificial Intelligence [3.7002058945990415]
This paper introduces a first-of-a-kind framework for the efficient pre-training and fine-tuning of foundational models on extensive data.
We have utilized this framework to create Prithvi, a transformer-based foundational model pre-trained on more than 1TB of multispectral satellite imagery.
arXiv Detail & Related papers (2023-10-28T10:19:55Z) - SPOT: Scalable 3D Pre-training via Occupancy Prediction for Learning Transferable 3D Representations [76.45009891152178]
Pretraining-finetuning approach can alleviate the labeling burden by fine-tuning a pre-trained backbone across various downstream datasets as well as tasks.
We show, for the first time, that general representations learning can be achieved through the task of occupancy prediction.
Our findings will facilitate the understanding of LiDAR points and pave the way for future advancements in LiDAR pre-training.
arXiv Detail & Related papers (2023-09-19T11:13:01Z) - LiDAR Data Synthesis with Denoising Diffusion Probabilistic Models [1.1965844936801797]
Generative modeling of 3D LiDAR data is an emerging task with promising applications for autonomous mobile robots.
We present R2DM, a novel generative model for LiDAR data that can generate diverse and high-fidelity 3D scene point clouds.
Our method is built upon denoising diffusion probabilistic models (DDPMs), which have shown impressive results among generative model frameworks.
arXiv Detail & Related papers (2023-09-17T12:26:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.