ANAVI: Audio Noise Awareness using Visuals of Indoor environments for NAVIgation
- URL: http://arxiv.org/abs/2410.18932v1
- Date: Thu, 24 Oct 2024 17:19:53 GMT
- Title: ANAVI: Audio Noise Awareness using Visuals of Indoor environments for NAVIgation
- Authors: Vidhi Jain, Rishi Veerapaneni, Yonatan Bisk,
- Abstract summary: We propose Audio Noise Awareness using Visuals of Indoors for NAVIgation for quieter robot path planning.
We generate data on how loud an 'impulse' sounds at different listener locations in simulated homes, and train our Acoustic Noise Predictor (ANP)
Unifying ANP with action acoustics, we demonstrate experiments with wheeled (Hello Robot Stretch) and legged (Unitree Go2) robots so that these robots adhere to the noise constraints of the environment.
- Score: 26.460679530665487
- License:
- Abstract: We propose Audio Noise Awareness using Visuals of Indoors for NAVIgation for quieter robot path planning. While humans are naturally aware of the noise they make and its impact on those around them, robots currently lack this awareness. A key challenge in achieving audio awareness for robots is estimating how loud will the robot's actions be at a listener's location? Since sound depends upon the geometry and material composition of rooms, we train the robot to passively perceive loudness using visual observations of indoor environments. To this end, we generate data on how loud an 'impulse' sounds at different listener locations in simulated homes, and train our Acoustic Noise Predictor (ANP). Next, we collect acoustic profiles corresponding to different actions for navigation. Unifying ANP with action acoustics, we demonstrate experiments with wheeled (Hello Robot Stretch) and legged (Unitree Go2) robots so that these robots adhere to the noise constraints of the environment. See code and data at https://anavi-corl24.github.io/
Related papers
- Reacting like Humans: Incorporating Intrinsic Human Behaviors into NAO through Sound-Based Reactions to Fearful and Shocking Events for Enhanced Sociability [0.0]
Humans show natural reactions when they encounter a sudden and loud sound that startles or frightens them.
In this work, a multi-modal system was designed to sense the environment and, in the presence of sudden loud sounds, show natural human fear reactions.
These valid generated motions and inferences could imitate intrinsic human reactions and enhance the sociability of robots.
arXiv Detail & Related papers (2023-12-12T19:06:44Z) - The Un-Kidnappable Robot: Acoustic Localization of Sneaking People [25.494191141691616]
We collect a robotic dataset of high-quality 4-channel audio paired with 360 degree RGB data of people moving in different indoor settings.
We train models that predict if there is a moving person nearby and their location using only audio.
We implement our method on a robot, allowing it to track a single person moving quietly with only passive audio sensing.
arXiv Detail & Related papers (2023-10-05T17:59:55Z) - HomeRobot: Open-Vocabulary Mobile Manipulation [107.05702777141178]
Open-Vocabulary Mobile Manipulation (OVMM) is the problem of picking any object in any unseen environment, and placing it in a commanded location.
HomeRobot has two components: a simulation component, which uses a large and diverse curated object set in new, high-quality multi-room home environments; and a real-world component, providing a software stack for the low-cost Hello Robot Stretch.
arXiv Detail & Related papers (2023-06-20T14:30:32Z) - Robot Learning with Sensorimotor Pre-training [98.7755895548928]
We present a self-supervised sensorimotor pre-training approach for robotics.
Our model, called RPT, is a Transformer that operates on sequences of sensorimotor tokens.
We find that sensorimotor pre-training consistently outperforms training from scratch, has favorable scaling properties, and enables transfer across different tasks, environments, and robots.
arXiv Detail & Related papers (2023-06-16T17:58:10Z) - Sonicverse: A Multisensory Simulation Platform for Embodied Household
Agents that See and Hear [65.33183123368804]
Sonicverse is a multisensory simulation platform with integrated audio-visual simulation.
It enables embodied AI tasks that need audio-visual perception.
An agent trained in Sonicverse can successfully perform audio-visual navigation in real-world environments.
arXiv Detail & Related papers (2023-06-01T17:24:01Z) - Read the Room: Adapting a Robot's Voice to Ambient and Social Contexts [1.0732907121422146]
We describe a process and results toward selecting robot voice styles for perceived social appropriateness and ambiance awareness.
Our results with N=120 participants provide evidence that the choice of voice style in different ambiances impacted a robot's perceived intelligence.
arXiv Detail & Related papers (2022-05-10T15:10:23Z) - Active Audio-Visual Separation of Dynamic Sound Sources [93.97385339354318]
We propose a reinforcement learning agent equipped with a novel transformer memory that learns motion policies to control its camera and microphone.
We show that our model is able to learn efficient behavior to carry out continuous separation of a time-varying audio target.
arXiv Detail & Related papers (2022-02-02T02:03:28Z) - Move2Hear: Active Audio-Visual Source Separation [90.16327303008224]
We introduce the active audio-visual source separation problem, where an agent must move intelligently in order to better isolate the sounds coming from an object of interest.
We introduce a reinforcement learning approach that trains movement policies controlling the agent's camera and microphone placement over time.
We demonstrate our model's ability to find minimal movement sequences with maximal payoff for audio source separation.
arXiv Detail & Related papers (2021-05-15T04:58:08Z) - Self-supervised reinforcement learning for speaker localisation with the
iCub humanoid robot [58.2026611111328]
Looking at a person's face is one of the mechanisms that humans rely on when it comes to filtering speech in noisy environments.
Having a robot that can look toward a speaker could benefit ASR performance in challenging environments.
We propose a self-supervised reinforcement learning-based framework inspired by the early development of humans.
arXiv Detail & Related papers (2020-11-12T18:02:15Z) - Swoosh! Rattle! Thump! -- Actions that Sound [38.59779002672538]
This work is the first large-scale study of the interactions between sound and robotic action.
We create the largest available sound-action-vision dataset with 15,000 interactions on 60 objects using our robotic platform Tilt-Bot.
Sound is indicative of fine-grained object class information, e.g., sound can differentiate a metal screwdriver from a metal wrench.
arXiv Detail & Related papers (2020-07-03T17:57:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.