Learning Object-Based State Estimators for Household Robots
- URL: http://arxiv.org/abs/2011.03183v4
- Date: Sun, 31 Jul 2022 16:31:23 GMT
- Title: Learning Object-Based State Estimators for Household Robots
- Authors: Yilun Du, Tomas Lozano-Perez, Leslie Kaelbling
- Abstract summary: We build object-based memory systems that operate on high-dimensional observations and hypotheses.
We demonstrate the system's effectiveness in maintaining memory of dynamically changing objects in both simulated environment and real images.
- Score: 11.055133590909097
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A robot operating in a household makes observations of multiple objects as it
moves around over the course of days or weeks. The objects may be moved by
inhabitants, but not completely at random. The robot may be called upon later
to retrieve objects and will need a long-term object-based memory in order to
know how to find them. Existing work in semantic slam does not attempt to
capture the dynamics of object movement. In this paper, we combine some aspects
of classic techniques for data-association filtering with modern
attention-based neural networks to construct object-based memory systems that
operate on high-dimensional observations and hypotheses. We perform end-to-end
learning on labeled observation trajectories to learn both the transition and
observation models. We demonstrate the system's effectiveness in maintaining
memory of dynamically changing objects in both simulated environment and real
images, and demonstrate improvements over classical structured approaches as
well as unstructured neural approaches. Additional information available at
project website: https://yilundu.github.io/obm/.
Related papers
- Learning Object Properties Using Robot Proprioception via Differentiable Robot-Object Interaction [52.12746368727368]
Differentiable simulation has become a powerful tool for system identification.
Our approach calibrates object properties by using information from the robot, without relying on data from the object itself.
We demonstrate the effectiveness of our method on a low-cost robotic platform.
arXiv Detail & Related papers (2024-10-04T20:48:38Z) - Out of Sight, Still in Mind: Reasoning and Planning about Unobserved Objects with Video Tracking Enabled Memory Models [11.126673648719345]
We investigate the problem of encoding object-oriented memory into a multi-object manipulation reasoning framework.
We propose LOOM, which leverage transformer dynamics to encode the history of trajectories given partial-view point clouds.
Our approaches can perform multiple tasks including reasoning with occluded novel objects appearance, and object reappearance.
arXiv Detail & Related papers (2023-09-26T21:31:24Z) - ROAM: Robust and Object-Aware Motion Generation Using Neural Pose
Descriptors [73.26004792375556]
This paper shows that robustness and generalisation to novel scene objects in 3D object-aware character synthesis can be achieved by training a motion model with as few as one reference object.
We leverage an implicit feature representation trained on object-only datasets, which encodes an SE(3)-equivariant descriptor field around the object.
We demonstrate substantial improvements in 3D virtual character motion and interaction quality and robustness to scenarios with unseen objects.
arXiv Detail & Related papers (2023-08-24T17:59:51Z) - Perceiving Unseen 3D Objects by Poking the Objects [45.70559270947074]
We propose a poking-based approach that automatically discovers and reconstructs 3D objects.
The poking process not only enables the robot to discover unseen 3D objects but also produces multi-view observations.
The experiments on real-world data show that our approach could unsupervisedly discover and reconstruct unseen 3D objects with high quality.
arXiv Detail & Related papers (2023-02-26T18:22:13Z) - The Right Spin: Learning Object Motion from Rotation-Compensated Flow
Fields [61.664963331203666]
How humans perceive moving objects is a longstanding research question in computer vision.
One approach to the problem is to teach a deep network to model all of these effects.
We present a novel probabilistic model to estimate the camera's rotation given the motion field.
arXiv Detail & Related papers (2022-02-28T22:05:09Z) - KINet: Unsupervised Forward Models for Robotic Pushing Manipulation [8.572983995175909]
We introduce KINet -- an unsupervised framework to reason about object interactions based on a keypoint representation.
Our model learns to associate objects with keypoint coordinates and discovers a graph representation of the system.
By learning to perform physical reasoning in the keypoint space, our model automatically generalizes to scenarios with a different number of objects.
arXiv Detail & Related papers (2022-02-18T03:32:08Z) - Improving Object Permanence using Agent Actions and Reasoning [8.847502932609737]
Existing approaches learn object permanence from low-level perception.
We argue that object permanence can be improved when the robot uses knowledge about executed actions.
arXiv Detail & Related papers (2021-10-01T07:09:49Z) - Dynamic Modeling of Hand-Object Interactions via Tactile Sensing [133.52375730875696]
In this work, we employ a high-resolution tactile glove to perform four different interactive activities on a diversified set of objects.
We build our model on a cross-modal learning framework and generate the labels using a visual processing pipeline to supervise the tactile model.
This work takes a step on dynamics modeling in hand-object interactions from dense tactile sensing.
arXiv Detail & Related papers (2021-09-09T16:04:14Z) - INVIGORATE: Interactive Visual Grounding and Grasping in Clutter [56.00554240240515]
INVIGORATE is a robot system that interacts with human through natural language and grasps a specified object in clutter.
We train separate neural networks for object detection, for visual grounding, for question generation, and for OBR detection and grasping.
We build a partially observable Markov decision process (POMDP) that integrates the learned neural network modules.
arXiv Detail & Related papers (2021-08-25T07:35:21Z) - Simultaneous Multi-View Object Recognition and Grasping in Open-Ended
Domains [0.0]
We propose a deep learning architecture with augmented memory capacities to handle open-ended object recognition and grasping simultaneously.
We demonstrate the ability of our approach to grasp never-seen-before objects and to rapidly learn new object categories using very few examples on-site in both simulation and real-world settings.
arXiv Detail & Related papers (2021-06-03T14:12:11Z) - Reactive Human-to-Robot Handovers of Arbitrary Objects [57.845894608577495]
We present a vision-based system that enables human-to-robot handovers of unknown objects.
Our approach combines closed-loop motion planning with real-time, temporally-consistent grasp generation.
We demonstrate the generalizability, usability, and robustness of our approach on a novel benchmark set of 26 diverse household objects.
arXiv Detail & Related papers (2020-11-17T21:52:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.