Sim2Real Transfer for Audio-Visual Navigation with Frequency-Adaptive Acoustic Field Prediction
- URL: http://arxiv.org/abs/2405.02821v2
- Date: Tue, 10 Sep 2024 23:43:53 GMT
- Title: Sim2Real Transfer for Audio-Visual Navigation with Frequency-Adaptive Acoustic Field Prediction
- Authors: Changan Chen, Jordi Ramos, Anshul Tomar, Kristen Grauman,
- Abstract summary: We propose the first treatment of sim2real for audio-visual navigation by disentangling it into acoustic field prediction (AFP) and waypoint navigation.
We then collect real-world data to measure the spectral difference between the simulation and the real world by training AFP models that only take a specific frequency subband as input.
Lastly, we build a real robot platform and show that the transferred policy can successfully navigate to sounding objects.
- Score: 51.71299452862839
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sim2real transfer has received increasing attention lately due to the success of learning robotic tasks in simulation end-to-end. While there has been a lot of progress in transferring vision-based navigation policies, the existing sim2real strategy for audio-visual navigation performs data augmentation empirically without measuring the acoustic gap. The sound differs from light in that it spans across much wider frequencies and thus requires a different solution for sim2real. We propose the first treatment of sim2real for audio-visual navigation by disentangling it into acoustic field prediction (AFP) and waypoint navigation. We first validate our design choice in the SoundSpaces simulator and show improvement on the Continuous AudioGoal navigation benchmark. We then collect real-world data to measure the spectral difference between the simulation and the real world by training AFP models that only take a specific frequency subband as input. We further propose a frequency-adaptive strategy that intelligently selects the best frequency band for prediction based on both the measured spectral difference and the energy distribution of the received audio, which improves the performance on the real data. Lastly, we build a real robot platform and show that the transferred policy can successfully navigate to sounding objects. This work demonstrates the potential of building intelligent agents that can see, hear, and act entirely from simulation, and transferring them to the real world.
Related papers
- Dynamics as Prompts: In-Context Learning for Sim-to-Real System Identifications [23.94013806312391]
We propose a novel approach that dynamically adjusts simulation environment parameters online using in-context learning.
We validate our approach across two tasks: object scooping and table air hockey.
Our approach delivers efficient and smooth system identification, advancing the deployment of robots in dynamic real-world scenarios.
arXiv Detail & Related papers (2024-10-27T07:13:38Z) - Sonicverse: A Multisensory Simulation Platform for Embodied Household
Agents that See and Hear [65.33183123368804]
Sonicverse is a multisensory simulation platform with integrated audio-visual simulation.
It enables embodied AI tasks that need audio-visual perception.
An agent trained in Sonicverse can successfully perform audio-visual navigation in real-world environments.
arXiv Detail & Related papers (2023-06-01T17:24:01Z) - RARA: Zero-shot Sim2Real Visual Navigation with Following Foreground
Cues [42.998649025215045]
We tackle the specific case of camera-based navigation, formulating it as following a visual cue in the foreground with arbitrary backgrounds.
The goal is to train a visual agent on data captured in an empty simulated environment except for this foreground cue and test this model directly in a visually diverse real world.
arXiv Detail & Related papers (2022-01-08T09:53:21Z) - Auto-Tuned Sim-to-Real Transfer [143.44593793640814]
Policies trained in simulation often fail when transferred to the real world.
Current approaches to tackle this problem, such as domain randomization, require prior knowledge and engineering.
We propose a method for automatically tuning simulator system parameters to match the real world.
arXiv Detail & Related papers (2021-04-15T17:59:55Z) - TrafficSim: Learning to Simulate Realistic Multi-Agent Behaviors [74.67698916175614]
We propose TrafficSim, a multi-agent behavior model for realistic traffic simulation.
In particular, we leverage an implicit latent variable model to parameterize a joint actor policy.
We show TrafficSim generates significantly more realistic and diverse traffic scenarios as compared to a diverse set of baselines.
arXiv Detail & Related papers (2021-01-17T00:29:30Z) - Learning to Set Waypoints for Audio-Visual Navigation [89.42192208471735]
In audio-visual navigation, an agent intelligently travels through a complex, unmapped 3D environment using both sights and sounds to find a sound source.
Existing models learn to act at a fixed granularity of agent motion and rely on simple recurrent aggregations of the audio observations.
We introduce a reinforcement learning approach to audio-visual navigation with two key novel elements.
arXiv Detail & Related papers (2020-08-21T18:00:33Z) - Point Cloud Based Reinforcement Learning for Sim-to-Real and Partial
Observability in Visual Navigation [62.22058066456076]
Reinforcement Learning (RL) represents powerful tools to solve complex robotic tasks.
RL does not work directly in the real-world, which is known as the sim-to-real transfer problem.
We propose a method that learns on an observation space constructed by point clouds and environment randomization.
arXiv Detail & Related papers (2020-07-27T17:46:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.