Sonicverse: A Multisensory Simulation Platform for Embodied Household
Agents that See and Hear
- URL: http://arxiv.org/abs/2306.00923v2
- Date: Sat, 16 Sep 2023 22:10:40 GMT
- Title: Sonicverse: A Multisensory Simulation Platform for Embodied Household
Agents that See and Hear
- Authors: Ruohan Gao, Hao Li, Gokul Dharan, Zhuzhu Wang, Chengshu Li, Fei Xia,
Silvio Savarese, Li Fei-Fei, Jiajun Wu
- Abstract summary: Sonicverse is a multisensory simulation platform with integrated audio-visual simulation.
It enables embodied AI tasks that need audio-visual perception.
An agent trained in Sonicverse can successfully perform audio-visual navigation in real-world environments.
- Score: 65.33183123368804
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Developing embodied agents in simulation has been a key research topic in
recent years. Exciting new tasks, algorithms, and benchmarks have been
developed in various simulators. However, most of them assume deaf agents in
silent environments, while we humans perceive the world with multiple senses.
We introduce Sonicverse, a multisensory simulation platform with integrated
audio-visual simulation for training household agents that can both see and
hear. Sonicverse models realistic continuous audio rendering in 3D environments
in real-time. Together with a new audio-visual VR interface that allows humans
to interact with agents with audio, Sonicverse enables a series of embodied AI
tasks that need audio-visual perception. For semantic audio-visual navigation
in particular, we also propose a new multi-task learning model that achieves
state-of-the-art performance. In addition, we demonstrate Sonicverse's realism
via sim-to-real transfer, which has not been achieved by other simulators: an
agent trained in Sonicverse can successfully perform audio-visual navigation in
real-world environments. Sonicverse is available at:
https://github.com/StanfordVL/Sonicverse.
Related papers
- Sim2Real Transfer for Audio-Visual Navigation with Frequency-Adaptive Acoustic Field Prediction [51.71299452862839]
We propose the first treatment of sim2real for audio-visual navigation by disentangling it into acoustic field prediction (AFP) and waypoint navigation.
We then collect real-world data to measure the spectral difference between the simulation and the real world by training AFP models that only take a specific frequency subband as input.
Lastly, we build a real robot platform and show that the transferred policy can successfully navigate to sounding objects.
arXiv Detail & Related papers (2024-05-05T06:01:31Z) - Virtual Reality in Metaverse over Wireless Networks with User-centered
Deep Reinforcement Learning [8.513938423514636]
We introduce a multi-user VR computation offloading over wireless communication scenario.
In addition, we devised a novel user-centered deep reinforcement learning approach to find a near-optimal solution.
arXiv Detail & Related papers (2023-03-08T03:10:41Z) - SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning [127.1119359047849]
We introduce SoundSpaces 2.0, a platform for on-the-fly geometry-based audio rendering for 3D environments.
It generates highly realistic acoustics for arbitrary sounds captured from arbitrary microphone locations.
SoundSpaces 2.0 is publicly available to facilitate wider research for perceptual systems that can both see and hear.
arXiv Detail & Related papers (2022-06-16T17:17:44Z) - Agents that Listen: High-Throughput Reinforcement Learning with Multiple
Sensory Systems [6.952659395337689]
We introduce a new version of VizDoom simulator to create a highly efficient learning environment that provides raw audio observations.
We train our agent to play the full game of Doom and find that it can consistently defeat a traditional vision-based adversary.
arXiv Detail & Related papers (2021-07-05T18:00:50Z) - DriveGAN: Towards a Controllable High-Quality Neural Simulation [147.6822288981004]
We introduce a novel high-quality neural simulator referred to as DriveGAN.
DriveGAN achieves controllability by disentangling different components without supervision.
We train DriveGAN on multiple datasets, including 160 hours of real-world driving data.
arXiv Detail & Related papers (2021-04-30T15:30:05Z) - Learning to Set Waypoints for Audio-Visual Navigation [89.42192208471735]
In audio-visual navigation, an agent intelligently travels through a complex, unmapped 3D environment using both sights and sounds to find a sound source.
Existing models learn to act at a fixed granularity of agent motion and rely on simple recurrent aggregations of the audio observations.
We introduce a reinforcement learning approach to audio-visual navigation with two key novel elements.
arXiv Detail & Related papers (2020-08-21T18:00:33Z) - Learning to Simulate Dynamic Environments with GameGAN [109.25308647431952]
In this paper, we aim to learn a simulator by simply watching an agent interact with an environment.
We introduce GameGAN, a generative model that learns to visually imitate a desired game by ingesting screenplay and keyboard actions during training.
arXiv Detail & Related papers (2020-05-25T14:10:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.