Knolling Bot: Learning Robotic Object Arrangement from Tidy Demonstrations
- URL: http://arxiv.org/abs/2310.04566v2
- Date: Fri, 15 Mar 2024 18:37:34 GMT
- Title: Knolling Bot: Learning Robotic Object Arrangement from Tidy Demonstrations
- Authors: Yuhang Hu, Zhizhuo Zhang, Xinyue Zhu, Ruibo Liu, Philippe Wyder, Hod Lipson,
- Abstract summary: This paper introduces a self-supervised learning framework that allows robots to understand and replicate the concept of tidiness.
We leverage a transformer neural network to predict the placement of subsequent objects.
Our method not only trains a generalizable concept of tidiness, but it can also incorporate human preferences to generate customized tidy tables.
- Score: 11.873522421121173
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Addressing the challenge of organizing scattered items in domestic spaces is complicated by the diversity and subjective nature of tidiness. Just as the complexity of human language allows for multiple expressions of the same idea, household tidiness preferences and organizational patterns vary widely, so presetting object locations would limit the adaptability to new objects and environments. Inspired by advancements in natural language processing (NLP), this paper introduces a self-supervised learning framework that allows robots to understand and replicate the concept of tidiness from demonstrations of well-organized layouts, akin to using conversational datasets to train Large Language Models(LLM). We leverage a transformer neural network to predict the placement of subsequent objects. We demonstrate a ``knolling'' system with a robotic arm and an RGB camera to organize items of varying sizes and quantities on a table. Our method not only trains a generalizable concept of tidiness, enabling the model to provide diverse solutions and adapt to different numbers of objects, but it can also incorporate human preferences to generate customized tidy tables without explicit target positions for each object.
Related papers
- Context-Aware Command Understanding for Tabletop Scenarios [1.7082212774297747]
This paper presents a novel hybrid algorithm designed to interpret natural human commands in tabletop scenarios.
By integrating multiple sources of information, including speech, gestures, and scene context, the system extracts actionable instructions for a robot.
We discuss the strengths and limitations of the system, with particular focus on how it handles multimodal command interpretation.
arXiv Detail & Related papers (2024-10-08T20:46:39Z) - RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic
Control [140.48218261864153]
We study how vision-language models trained on Internet-scale data can be incorporated directly into end-to-end robotic control.
Our approach leads to performant robotic policies and enables RT-2 to obtain a range of emergent capabilities from Internet-scale training.
arXiv Detail & Related papers (2023-07-28T21:18:02Z) - simPLE: a visuotactile method learned in simulation to precisely pick,
localize, regrasp, and place objects [16.178331266949293]
This paper explores solutions for precise and general pick-and-place.
We propose simPLE as a solution to precise pick-and-place.
SimPLE learns to pick, regrasp and place objects precisely, given only the object CAD model and no prior experience.
arXiv Detail & Related papers (2023-07-24T21:22:58Z) - Transferring Foundation Models for Generalizable Robotic Manipulation [82.12754319808197]
We propose a novel paradigm that effectively leverages language-reasoning segmentation mask generated by internet-scale foundation models.
Our approach can effectively and robustly perceive object pose and enable sample-efficient generalization learning.
Demos can be found in our submitted video, and more comprehensive ones can be found in link1 or link2.
arXiv Detail & Related papers (2023-06-09T07:22:12Z) - VLMbench: A Compositional Benchmark for Vision-and-Language Manipulation [11.92150014766458]
We aim to fill the blank of the last mile of embodied agents -- object manipulation by following human guidance.
We build a Vision-and-Language Manipulation benchmark (VLMbench) based on it, containing various language instructions on categorized robotic manipulation tasks.
modular rule-based task templates are created to automatically generate robot demonstrations with language instructions.
arXiv Detail & Related papers (2022-06-17T03:07:18Z) - V-MAO: Generative Modeling for Multi-Arm Manipulation of Articulated
Objects [51.79035249464852]
We present a framework for learning multi-arm manipulation of articulated objects.
Our framework includes a variational generative model that learns contact point distribution over object rigid parts for each robot arm.
arXiv Detail & Related papers (2021-11-07T02:31:09Z) - Learning to Regrasp by Learning to Place [19.13976401970985]
Regrasping is needed when a robot's current grasp pose fails to perform desired manipulation tasks.
We propose a system for robots to take partial point clouds of an object and the supporting environment as inputs and output a sequence of pick-and-place operations.
We show that our system is able to achieve 73.3% success rate of regrasping diverse objects.
arXiv Detail & Related papers (2021-09-18T03:07:06Z) - Learning Language-Conditioned Robot Behavior from Offline Data and
Crowd-Sourced Annotation [80.29069988090912]
We study the problem of learning a range of vision-based manipulation tasks from a large offline dataset of robot interaction.
We propose to leverage offline robot datasets with crowd-sourced natural language labels.
We find that our approach outperforms both goal-image specifications and language conditioned imitation techniques by more than 25%.
arXiv Detail & Related papers (2021-09-02T17:42:13Z) - Predicting Stable Configurations for Semantic Placement of Novel Objects [37.18437299513799]
Our goal is to enable robots to repose previously unseen objects according to learned semantic relationships in novel environments.
We build our models and training from the ground up to be tightly integrated with our proposed planning algorithm for semantic placement of unknown objects.
Our approach enables motion planning for semantic rearrangement of unknown objects in scenes with varying geometry from only RGB-D sensing.
arXiv Detail & Related papers (2021-08-26T23:05:05Z) - INVIGORATE: Interactive Visual Grounding and Grasping in Clutter [56.00554240240515]
INVIGORATE is a robot system that interacts with human through natural language and grasps a specified object in clutter.
We train separate neural networks for object detection, for visual grounding, for question generation, and for OBR detection and grasping.
We build a partially observable Markov decision process (POMDP) that integrates the learned neural network modules.
arXiv Detail & Related papers (2021-08-25T07:35:21Z) - NeRP: Neural Rearrangement Planning for Unknown Objects [49.191284597526]
We propose NeRP (Neural Rearrangement Planning), a deep learning based approach for multi-step neural object rearrangement planning.
NeRP works with never-before-seen objects, that is trained on simulation data, and generalizes to the real world.
arXiv Detail & Related papers (2021-06-02T17:56:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.