Related papers: SOUS VIDE: Cooking Visual Drone Navigation Policies in a Gaussian Splatting Vacuum

SOUS VIDE: Cooking Visual Drone Navigation Policies in a Gaussian Splatting Vacuum

URL: http://arxiv.org/abs/2412.16346v2
Date: Fri, 21 Mar 2025 17:22:28 GMT
Title: SOUS VIDE: Cooking Visual Drone Navigation Policies in a Gaussian Splatting Vacuum
Authors: JunEn Low, Maximilian Adang, Javier Yu, Keiko Nagami, Mac Schwager,
Abstract summary: SOUS VIDE is a simulator, training approach, and policy architecture for end-to-end visual drone navigation.<n>Our policies exhibit zero-shot sim-to-real transfer with robust real-world performance using only onboard perception and computation.
Score: 8.410894757762346
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We propose a new simulator, training approach, and policy architecture, collectively called SOUS VIDE, for end-to-end visual drone navigation. Our trained policies exhibit zero-shot sim-to-real transfer with robust real-world performance using only onboard perception and computation. Our simulator, called FiGS, couples a computationally simple drone dynamics model with a high visual fidelity Gaussian Splatting scene reconstruction. FiGS can quickly simulate drone flights producing photorealistic images at up to 130 fps. We use FiGS to collect 100k-300k image/state-action pairs from an expert MPC with privileged state and dynamics information, randomized over dynamics parameters and spatial disturbances. We then distill this expert MPC into an end-to-end visuomotor policy with a lightweight neural architecture, called SV-Net. SV-Net processes color image, optical flow and IMU data streams into low-level thrust and body rate commands at 20 Hz onboard a drone. Crucially, SV-Net includes a learned module for low-level control that adapts at runtime to variations in drone dynamics. In a campaign of 105 hardware experiments, we show SOUS VIDE policies to be robust to 30% mass variations, 40 m/s wind gusts, 60% changes in ambient brightness, shifting or removing objects from the scene, and people moving aggressively through the drone's visual field. Code, data, and experiment videos can be found on our project page: https://stanfordmsl.github.io/SousVide/.

Related papers

SONIC: Supersizing Motion Tracking for Natural Humanoid Whole-Body Control [85.91101551600978]
We show that scaling up model capacity, data, and compute yields a generalist humanoid controller capable of creating natural and robust whole-body movements.<n>We build a foundation model for motion tracking by scaling along three axes: network size, dataset volume, and compute.<n>We show the practical utility of our model through two mechanisms: (1) a real-time universal kinematic planner that bridges motion tracking to downstream task execution, enabling natural and interactive control, and (2) a unified token space that supports various motion input interfaces.
arXiv Detail & Related papers (2025-11-11T04:37:40Z)
GaussGym: An open-source real-to-sim framework for learning locomotion from pixels [78.05453137978132]
We present a novel approach for photorealistic robot simulation that integrates 3D Gaussian Splatting as a drop-in within vectorized physics simulators.<n>This enables unprecedented speed -- exceeding 100,000 steps per second on consumer GPU.<n>We additionally demonstrate its applicability in a sim-to-real robotics setting.
arXiv Detail & Related papers (2025-10-17T06:34:52Z)
NOVA: Navigation via Object-Centric Visual Autonomy for High-Speed Target Tracking in Unstructured GPS-Denied Environments [56.35569661650558]
We introduce NOVA, a fully onboard, object-centric framework that enables robust target tracking and collision-aware navigation.<n>Rather than constructing a global map, NOVA formulates perception, estimation, and control entirely in the target's reference frame.<n>We validate NOVA across challenging real-world scenarios, including urban mazes, forest trails, and repeated transitions through buildings with intermittent GPS loss.
arXiv Detail & Related papers (2025-06-23T14:28:30Z)
FalconWing: An Open-Source Platform for Ultra-Light Fixed-Wing Aircraft Research [2.823704956886882]
FalconWing is an open-source, ultra-lightweight (150 g) fixed-wing platform for autonomy research.<n>We develop and deploy a vision-based control policy for autonomous landing using a novel real-to-sim-to-real learning approach.<n>When deployed zero-shot on the hardware platform, this policy achieves an 80% success rate in vision-based autonomous landings.
arXiv Detail & Related papers (2025-05-02T16:47:05Z)
Self-Supervised Monocular Visual Drone Model Identification through Improved Occlusion Handling [17.368574409020475]
Ego-motion estimation is vital for drones when flying in GPS-denied environments. We propose a self-supervised learning scheme to train a neural-network-based drone model using only onboard monocular video and flight controller data. We demonstrate the value of the neural drone model by integrating it into a traditional filter-based VIO system.
arXiv Detail & Related papers (2025-04-30T14:38:01Z)
YOLOMG: Vision-based Drone-to-Drone Detection with Appearance and Pixel-Level Motion Fusion [9.810747004677474]
This paper proposes a novel end-to-end framework that accurately identifies small drones in complex environments. It starts by creating a motion difference map to capture the motion characteristics of tiny drones. Next, this motion difference map is combined with an RGB image using a bimodal fusion module, allowing for adaptive feature learning of the drone.
arXiv Detail & Related papers (2025-03-10T09:44:21Z)
A Cross-Scene Benchmark for Open-World Drone Active Tracking [54.235808061746525]
Drone Visual Active Tracking aims to autonomously follow a target object by controlling the motion system based on visual observations.<n>We propose a unified cross-scene cross-domain benchmark for open-world drone active tracking called DAT.<n>We also propose a reinforcement learning-based drone tracking method called R-VAT.
arXiv Detail & Related papers (2024-12-01T09:37:46Z)
ScatterNeRF: Seeing Through Fog with Physically-Based Inverse Neural Rendering [83.75284107397003]
We introduce ScatterNeRF, a neural rendering method which renders scenes and decomposes the fog-free background. We propose a disentangled representation for the scattering volume and the scene objects, and learn the scene reconstruction with physics-inspired losses. We validate our method by capturing multi-view In-the-Wild data and controlled captures in a large-scale fog chamber.
arXiv Detail & Related papers (2023-05-03T13:24:06Z)
TransVisDrone: Spatio-Temporal Transformer for Vision-based Drone-to-Drone Detection in Aerial Videos [57.92385818430939]
Drone-to-drone detection using visual feed has crucial applications, such as detecting drone collisions, detecting drone attacks, or coordinating flight with other drones. Existing methods are computationally costly, follow non-end-to-end optimization, and have complex multi-stage pipelines, making them less suitable for real-time deployment on edge devices. We propose a simple yet effective framework, itTransVisDrone, that provides an end-to-end solution with higher computational efficiency.
arXiv Detail & Related papers (2022-10-16T03:05:13Z)
Learning a Single Near-hover Position Controller for Vastly Different Quadcopters [56.37274861303324]
This paper proposes an adaptive near-hover position controller for quadcopters. It can be deployed to quadcopters of very different mass, size and motor constants. It also shows rapid adaptation to unknown disturbances during runtime.
arXiv Detail & Related papers (2022-09-19T17:55:05Z)
MOBDrone: a Drone Video Dataset for Man OverBoard Rescue [4.393945242867356]
We release the MOBDrone benchmark, a collection of more than 125K drone-view images in a marine environment under several conditions. We manually annotated more than 180K objects, of which about 113K man overboard, precisely localizing them with bounding boxes. We conduct a thorough performance analysis of several state-of-the-art object detectors on the MOBDrone data, serving as baselines for further research.
arXiv Detail & Related papers (2022-03-15T15:02:23Z)
EVPropNet: Detecting Drones By Finding Propellers For Mid-Air Landing And Following [11.79762223888294]
Drone propellers are the fastest moving parts of an image and cannot be directly "seen" by a classical camera without severe motion blur. We train a deep neural network called EVPropNet to detect propellers from the data of an event camera. We present two applications of our network: (a) tracking and following an unmarked drone and (b) landing on a near-hover drone.
arXiv Detail & Related papers (2021-06-29T01:16:01Z)
Out of the Box: Embodied Navigation in the Real World [45.97756658635314]
We show how to transfer knowledge acquired in simulation into the real world. We deploy our models on a LoCoBot equipped with a single Intel RealSense camera. Our experiments indicate that it is possible to achieve satisfying results when deploying the obtained model in the real world.
arXiv Detail & Related papers (2021-05-12T18:00:14Z)
DriveGAN: Towards a Controllable High-Quality Neural Simulation [147.6822288981004]
We introduce a novel high-quality neural simulator referred to as DriveGAN. DriveGAN achieves controllability by disentangling different components without supervision. We train DriveGAN on multiple datasets, including 160 hours of real-world driving data.
arXiv Detail & Related papers (2021-04-30T15:30:05Z)
A Flow Base Bi-path Network for Cross-scene Video Crowd Understanding in Aerial View [93.23947591795897]
In this paper, we strive to tackle the challenges and automatically understand the crowd from the visual data collected from drones. To alleviate the background noise generated in cross-scene testing, a double-stream crowd counting model is proposed. To tackle the crowd density estimation problem under extreme dark environments, we introduce synthetic data generated by game Grand Theft Auto V(GTAV)
arXiv Detail & Related papers (2020-09-29T01:48:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.