Finding Fallen Objects Via Asynchronous Audio-Visual Integration
- URL: http://arxiv.org/abs/2207.03483v1
- Date: Thu, 7 Jul 2022 17:59:59 GMT
- Title: Finding Fallen Objects Via Asynchronous Audio-Visual Integration
- Authors: Chuang Gan, Yi Gu, Siyuan Zhou, Jeremy Schwartz, Seth Alter, James
Traer, Dan Gutfreund, Joshua B. Tenenbaum, Josh McDermott, Antonio Torralba
- Abstract summary: This paper introduces a setting in which to study multi-modal object localization in 3D virtual environments.
An embodied robot agent, equipped with a camera and microphone, must determine what object has been dropped -- and where -- by combining audio and visual signals with knowledge of the underlying physics.
The dataset uses the ThreeDWorld platform which can simulate physics-based impact sounds and complex physical interactions between objects in a photorealistic setting.
- Score: 89.75296559813437
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: The way an object looks and sounds provide complementary reflections of its
physical properties. In many settings cues from vision and audition arrive
asynchronously but must be integrated, as when we hear an object dropped on the
floor and then must find it. In this paper, we introduce a setting in which to
study multi-modal object localization in 3D virtual environments. An object is
dropped somewhere in a room. An embodied robot agent, equipped with a camera
and microphone, must determine what object has been dropped -- and where -- by
combining audio and visual signals with knowledge of the underlying physics. To
study this problem, we have generated a large-scale dataset -- the Fallen
Objects dataset -- that includes 8000 instances of 30 physical object
categories in 64 rooms. The dataset uses the ThreeDWorld platform which can
simulate physics-based impact sounds and complex physical interactions between
objects in a photorealistic setting. As a first step toward addressing this
challenge, we develop a set of embodied agent baselines, based on imitation
learning, reinforcement learning, and modular planning, and perform an in-depth
analysis of the challenge of this new task.
Related papers
- PhysDreamer: Physics-Based Interaction with 3D Objects via Video Generation [62.53760963292465]
PhysDreamer is a physics-based approach that endows static 3D objects with interactive dynamics.
We present our approach on diverse examples of elastic objects and evaluate the realism of the synthesized interactions through a user study.
arXiv Detail & Related papers (2024-04-19T17:41:05Z) - AffordanceLLM: Grounding Affordance from Vision Language Models [36.97072698640563]
Affordance grounding refers to the task of finding the area of an object with which one can interact.
Much of the knowledge is hidden and beyond the image content with the supervised labels from a limited training set.
We make an attempt to improve the generalization capability of the current affordance grounding by taking the advantage of the rich world, abstract, and human-object-interaction knowledge.
arXiv Detail & Related papers (2024-01-12T03:21:02Z) - ROAM: Robust and Object-Aware Motion Generation Using Neural Pose
Descriptors [73.26004792375556]
This paper shows that robustness and generalisation to novel scene objects in 3D object-aware character synthesis can be achieved by training a motion model with as few as one reference object.
We leverage an implicit feature representation trained on object-only datasets, which encodes an SE(3)-equivariant descriptor field around the object.
We demonstrate substantial improvements in 3D virtual character motion and interaction quality and robustness to scenarios with unseen objects.
arXiv Detail & Related papers (2023-08-24T17:59:51Z) - Discovering Objects that Can Move [55.743225595012966]
We study the problem of object discovery -- separating objects from the background without manual labels.
Existing approaches utilize appearance cues, such as color, texture, and location, to group pixels into object-like regions.
We choose to focus on dynamic objects -- entities that can move independently in the world.
arXiv Detail & Related papers (2022-03-18T21:13:56Z) - AKB-48: A Real-World Articulated Object Knowledge Base [38.4899076076656]
We present AKB-48: a large-scale Articulated object Knowledge Base which consists of 2,037 real-world 3D articulated object models of 48 categories.
To build the AKB-48, we present a fast articulation knowledge modeling (FArM) pipeline, which can fulfill the ArtiKG for an articulated object within 10-15 minutes.
Using our dataset, we propose AKBNet, a novel integral pipeline for Category-level Visual Articulation Manipulation (C-VAM) task.
arXiv Detail & Related papers (2022-02-17T03:24:07Z) - Virtual Elastic Objects [18.228492027143307]
We build virtual objects that behave like their real-world counterparts, even when subject to novel interactions.
We use a differentiable, particle-based simulator to use deformation fields to find representative material parameters.
We present our results using a dataset of 12 objects under a variety of force fields, which will be shared with the community.
arXiv Detail & Related papers (2022-01-12T18:59:03Z) - ObjectFolder: A Dataset of Objects with Implicit Visual, Auditory, and
Tactile Representations [52.226947570070784]
We present Object, a dataset of 100 objects that addresses both challenges with two key innovations.
First, Object encodes the visual, auditory, and tactile sensory data for all objects, enabling a number of multisensory object recognition tasks.
Second, Object employs a uniform, object-centric simulations, and implicit representation for each object's visual textures, tactile readings, and tactile readings, making the dataset flexible to use and easy to share.
arXiv Detail & Related papers (2021-09-16T14:00:59Z) - Occlusion resistant learning of intuitive physics from videos [52.25308231683798]
Key ability for artificial systems is to understand physical interactions between objects, and predict future outcomes of a situation.
This ability, often referred to as intuitive physics, has recently received attention and several methods were proposed to learn these physical rules from video sequences.
arXiv Detail & Related papers (2020-04-30T19:35:54Z) - Learning Object Permanence from Video [46.34427538905761]
This paper introduces the setup of learning Object Permanence from data.
We explain why this learning problem should be dissected into four components, where objects are visible, (2) occluded, (3) contained by another object and (4) carried by a containing object.
We then present a unified deep architecture that learns to predict object location under these four scenarios.
arXiv Detail & Related papers (2020-03-23T18:03:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.