Related papers: Tracking and Planning with Spatial World Models

Tracking and Planning with Spatial World Models

URL: http://arxiv.org/abs/2201.10335v1
Date: Tue, 25 Jan 2022 14:16:46 GMT
Title: Tracking and Planning with Spatial World Models
Authors: Baris Kayalibay, Atanas Mirchev, Patrick van der Smagt, Justin Bayer
Abstract summary: We introduce a method for real-time navigation and tracking with differentiably rendered world models. We achieve up to 92% navigation success rate at a frequency of 15 Hz using only image and depth observations.
Score: 17.698319441265223
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce a method for real-time navigation and tracking with differentiably rendered world models. Learning models for control has led to impressive results in robotics and computer games, but this success has yet to be extended to vision-based navigation. To address this, we transfer advances in the emergent field of differentiable rendering to model-based control. We do this by planning in a learned 3D spatial world model, combined with a pose estimation algorithm previously used in the context of TSDF fusion, but now tailored to our setting and improved to incorporate agent dynamics. We evaluate over six simulated environments based on complex human-designed floor plans and provide quantitative results. We achieve up to 92% navigation success rate at a frequency of 15 Hz using only image and depth observations under stochastic, continuous dynamics.

Related papers

TARDIS STRIDE: A Spatio-Temporal Road Image Dataset and World Model for Autonomy [44.85881816317044]
We show how to permuting 360-degree panoramic imagery into rich interconnected observation, state and action nodes.<n>We benchmark this dataset via TARDIS, a transformer-based generative world model.<n>We demonstrate robust performance across a range of agentic tasks such as controllable image synthesis, instruction following, autonomous self-control, and state-of-the-art georeferencing.
arXiv Detail & Related papers (2025-06-12T21:08:11Z)
GeoDrive: 3D Geometry-Informed Driving World Model with Precise Action Control [50.67481583744243]
We introduce GeoDrive, which explicitly integrates robust 3D geometry conditions into driving world models.<n>We propose a dynamic editing module during training to enhance the renderings by editing the positions of the vehicles.<n>Our method significantly outperforms existing models in both action accuracy and 3D spatial awareness.
arXiv Detail & Related papers (2025-05-28T14:46:51Z)
An Efficient Occupancy World Model via Decoupled Dynamic Flow and Image-assisted Training [50.71892161377806]
DFIT-OccWorld is an efficient 3D occupancy world model that leverages decoupled dynamic flow and image-assisted training strategy. Our model forecasts future dynamic voxels by warping existing observations using voxel flow, whereas static voxels are easily obtained through pose transformation.
arXiv Detail & Related papers (2024-12-18T12:10:33Z)
3D Multi-Object Tracking with Semi-Supervised GRU-Kalman Filter [6.13623925528906]
3D Multi-Object Tracking (MOT) is essential for intelligent systems like autonomous driving and robotic sensing. We propose a GRU-based MOT method, which introduces a learnable Kalman filter into the motion module. This approach is able to learn object motion characteristics through data-driven learning, thereby avoiding the need for manual model design and model error.
arXiv Detail & Related papers (2024-11-13T08:34:07Z)
Gaussian Splatting to Real World Flight Navigation Transfer with Liquid Networks [93.38375271826202]
We present a method to improve generalization and robustness to distribution shifts in sim-to-real visual quadrotor navigation tasks. We first build a simulator by integrating Gaussian splatting with quadrotor flight dynamics, and then, train robust navigation policies using Liquid neural networks. In this way, we obtain a full-stack imitation learning protocol that combines advances in 3D Gaussian splatting radiance field rendering, programming of expert demonstration training data, and the task understanding capabilities of Liquid networks.
arXiv Detail & Related papers (2024-06-21T13:48:37Z)
3D Hand Mesh Recovery from Monocular RGB in Camera Space [3.0453197258042213]
This study proposes a network model that performs parallel processing of root-relative grids and root recovery tasks. We utilize an implicit learning approach for 2D heatmaps, enhancing the compatibility of 2D cues across different subtasks. Our proposed model is comparable with state-of-the-art models.
arXiv Detail & Related papers (2024-05-12T05:36:37Z)
DO3D: Self-supervised Learning of Decomposed Object-aware 3D Motion and Depth from Monocular Videos [76.01906393673897]
We propose a self-supervised method to jointly learn 3D motion and depth from monocular videos. Our system contains a depth estimation module to predict depth, and a new decomposed object-wise 3D motion (DO3D) estimation module to predict ego-motion and 3D object motion. Our model delivers superior performance in all evaluated settings.
arXiv Detail & Related papers (2024-03-09T12:22:46Z)
Predictive Experience Replay for Continual Visual Control and Forecasting [62.06183102362871]
We present a new continual learning approach for visual dynamics modeling and explore its efficacy in visual control and forecasting. We first propose the mixture world model that learns task-specific dynamics priors with a mixture of Gaussians, and then introduce a new training strategy to overcome catastrophic forgetting. Our model remarkably outperforms the naive combinations of existing continual learning and visual RL algorithms on DeepMind Control and Meta-World benchmarks with continual visual control tasks.
arXiv Detail & Related papers (2023-03-12T05:08:03Z)
Data-Driven Machine Learning Models for a Multi-Objective Flapping Fin Unmanned Underwater Vehicle Control System [0.5522489572615558]
We develop a search-based inverse model that leverages a kinematics-to-thrust neural network model for control system design. We demonstrate how a control system integrating this inverse model can make online, cycle-to-cycle adjustments to prioritize different system objectives.
arXiv Detail & Related papers (2022-09-14T01:55:15Z)
Physics-Inspired Temporal Learning of Quadrotor Dynamics for Accurate Model Predictive Trajectory Tracking [76.27433308688592]
Accurately modeling quadrotor's system dynamics is critical for guaranteeing agile, safe, and stable navigation. We present a novel Physics-Inspired Temporal Convolutional Network (PI-TCN) approach to learning quadrotor's system dynamics purely from robot experience. Our approach combines the expressive power of sparse temporal convolutions and dense feed-forward connections to make accurate system predictions.
arXiv Detail & Related papers (2022-06-07T13:51:35Z)
Gradient-Based Trajectory Optimization With Learned Dynamics [80.41791191022139]
We use machine learning techniques to learn a differentiable dynamics model of the system from data. We show that a neural network can model highly nonlinear behaviors accurately for large time horizons. In our hardware experiments, we demonstrate that our learned model can represent complex dynamics for both the Spot and Radio-controlled (RC) car.
arXiv Detail & Related papers (2022-04-09T22:07:34Z)
Model-Based Visual Planning with Self-Supervised Functional Distances [104.83979811803466]
We present a self-supervised method for model-based visual goal reaching. Our approach learns entirely using offline, unlabeled data. We find that this approach substantially outperforms both model-free and model-based prior methods.
arXiv Detail & Related papers (2020-12-30T23:59:09Z)
Variational State-Space Models for Localisation and Dense 3D Mapping in 6 DoF [17.698319441265223]
We solve the problem of 6-DoF localisation and 3D dense reconstruction in spatial environments as approximate Bayesian inference in a deep state-space model. This results in an expressive predictive model of the world, often missing in current state-of-the-art visual SLAM solutions. We evaluate our approach on realistic unmanned aerial vehicle flight data, nearing the performance of state-of-the-art visual-inertial odometry systems.
arXiv Detail & Related papers (2020-06-17T22:06:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.