Knolling Bot: Teaching Robots the Human Notion of Tidiness
- URL: http://arxiv.org/abs/2310.04566v3
- Date: Sat, 01 Nov 2025 12:45:48 GMT
- Title: Knolling Bot: Teaching Robots the Human Notion of Tidiness
- Authors: Yuhang Hu, Judah Goldfeder, Zhizhuo Zhang, Xinyue Zhu, Ruibo Liu, Philippe Wyder, Jiong Lin, Hod Lipson,
- Abstract summary: We propose an approach that equips domestic robots to perform simple tidying tasks via knolling.<n>Knewing is the practice of arranging scattered items into neat, space-efficient layouts.<n>This work represents a step forward in building robots that internalize human aesthetic sense and can genuinely co-create in our living spaces.
- Score: 12.630637353265422
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: For robots to truly collaborate and assist humans, they must understand not only logic and instructions, but also the subtle emotions, aesthetics, and feelings that define our humanity. Human art and aesthetics are among the most elusive concepts-often difficult even for people to articulate-and without grasping these fundamentals, robots will be unable to help in many spheres of daily life. Consider the long-promised robotic butler: automating domestic chores demands more than motion planning. It requires an internal model of cleanliness and tidiness-a challenge largely unexplored by AI. To bridge this gap, we propose an approach that equips domestic robots to perform simple tidying tasks via knolling, the practice of arranging scattered items into neat, space-efficient layouts. Unlike the uniformity of industrial settings, household environments feature diverse objects and highly subjective notions of tidiness. Drawing inspiration from NLP, we treat knolling as a sequential prediction problem and employ a transformer based model to forecast each object's placement. Our method learns a generalizable concept of tidiness, generates diverse solutions adaptable to varying object sets, and incorporates human preferences for personalized arrangements. This work represents a step forward in building robots that internalize human aesthetic sense and can genuinely co-create in our living spaces.
Related papers
- Improving Generalization of Language-Conditioned Robot Manipulation [29.405161073483175]
We present a framework that learns object-arrangement tasks from just a few demonstrations.<n>We validate our method on both simulation and real-world robotic environments.
arXiv Detail & Related papers (2025-08-04T13:29:26Z) - A roadmap for AI in robotics [55.87087746398059]
We are witnessing growing excitement in robotics at the prospect of leveraging the potential of AI to tackle some of the outstanding barriers to the full deployment of robots in our daily lives.<n>This article offers an assessment of what AI for robotics has achieved since the 1990s and proposes a short- and medium-term research roadmap listing challenges and promises.
arXiv Detail & Related papers (2025-07-26T15:18:28Z) - LIAM: Multimodal Transformer for Language Instructions, Images, Actions and Semantic Maps [18.602777449136738]
We propose LIAM - an end-to-end model that predicts action transcripts based on language, image, action, and map inputs.
We evaluate our method on the ALFRED dataset, a simulator-generated benchmark for domestic tasks.
arXiv Detail & Related papers (2025-03-15T18:54:06Z) - Context-Aware Command Understanding for Tabletop Scenarios [1.7082212774297747]
This paper presents a novel hybrid algorithm designed to interpret natural human commands in tabletop scenarios.
By integrating multiple sources of information, including speech, gestures, and scene context, the system extracts actionable instructions for a robot.
We discuss the strengths and limitations of the system, with particular focus on how it handles multimodal command interpretation.
arXiv Detail & Related papers (2024-10-08T20:46:39Z) - Controlling diverse robots by inferring Jacobian fields with deep networks [48.279199537720714]
Mirroring the complex structures and diverse functions of natural organisms is a long-standing challenge in robotics.<n>We introduce a method that uses deep neural networks to map a video stream of a robot to its visuomotor Jacobian field.<n>Our approach achieves accurate closed-loop control and recovers the causal dynamic structure of each robot.
arXiv Detail & Related papers (2024-07-11T17:55:49Z) - HumanoidBench: Simulated Humanoid Benchmark for Whole-Body Locomotion and Manipulation [50.616995671367704]
We present a high-dimensional, simulated robot learning benchmark, HumanoidBench, featuring a humanoid robot equipped with dexterous hands.
Our findings reveal that state-of-the-art reinforcement learning algorithms struggle with most tasks, whereas a hierarchical learning approach achieves superior performance when supported by robust low-level policies.
arXiv Detail & Related papers (2024-03-15T17:45:44Z) - RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic
Control [140.48218261864153]
We study how vision-language models trained on Internet-scale data can be incorporated directly into end-to-end robotic control.
Our approach leads to performant robotic policies and enables RT-2 to obtain a range of emergent capabilities from Internet-scale training.
arXiv Detail & Related papers (2023-07-28T21:18:02Z) - simPLE: a visuotactile method learned in simulation to precisely pick,
localize, regrasp, and place objects [16.178331266949293]
This paper explores solutions for precise and general pick-and-place.
We propose simPLE as a solution to precise pick-and-place.
SimPLE learns to pick, regrasp and place objects precisely, given only the object CAD model and no prior experience.
arXiv Detail & Related papers (2023-07-24T21:22:58Z) - Transferring Foundation Models for Generalizable Robotic Manipulation [82.12754319808197]
We propose a novel paradigm that effectively leverages language-reasoning segmentation mask generated by internet-scale foundation models.
Our approach can effectively and robustly perceive object pose and enable sample-efficient generalization learning.
Demos can be found in our submitted video, and more comprehensive ones can be found in link1 or link2.
arXiv Detail & Related papers (2023-06-09T07:22:12Z) - Efficient automatic design of robots [43.968830087704035]
We show for the first time de-novo optimization of a robot's structure to exhibit a desired behavior, within seconds on a single consumer-grade computer.
Unlike other gradient-based robot design methods, this algorithm does not presuppose any particular anatomical form.
This advance promises near instantaneous design, manufacture, and deployment of unique and useful machines for medical, environmental, vehicular, and space-based tasks.
arXiv Detail & Related papers (2023-06-05T21:30:52Z) - Aligning Robot and Human Representations [50.070982136315784]
We argue that current representation learning approaches in robotics should be studied from the perspective of how well they accomplish the objective of representation alignment.
We mathematically define the problem, identify its key desiderata, and situate current methods within this formalism.
arXiv Detail & Related papers (2023-02-03T18:59:55Z) - HERD: Continuous Human-to-Robot Evolution for Learning from Human
Demonstration [57.045140028275036]
We show that manipulation skills can be transferred from a human to a robot through the use of micro-evolutionary reinforcement learning.
We propose an algorithm for multi-dimensional evolution path searching that allows joint optimization of both the robot evolution path and the policy.
arXiv Detail & Related papers (2022-12-08T15:56:13Z) - Learning Reward Functions for Robotic Manipulation by Observing Humans [92.30657414416527]
We use unlabeled videos of humans solving a wide range of manipulation tasks to learn a task-agnostic reward function for robotic manipulation policies.
The learned rewards are based on distances to a goal in an embedding space learned using a time-contrastive objective.
arXiv Detail & Related papers (2022-11-16T16:26:48Z) - VLMbench: A Compositional Benchmark for Vision-and-Language Manipulation [11.92150014766458]
We aim to fill the blank of the last mile of embodied agents -- object manipulation by following human guidance.
We build a Vision-and-Language Manipulation benchmark (VLMbench) based on it, containing various language instructions on categorized robotic manipulation tasks.
modular rule-based task templates are created to automatically generate robot demonstrations with language instructions.
arXiv Detail & Related papers (2022-06-17T03:07:18Z) - Aligning Robot Representations with Humans [5.482532589225552]
Key question is how to best transfer knowledge learned in one environment to another, where shifting constraints and human preferences render adaptation challenging.
We postulate that because humans will be the ultimate evaluator of system success in the world, they are best suited to communicating the aspects of the tasks that matter to the robot.
We highlight three areas where we can use this approach to build interactive systems and offer future directions of work to better create advanced collaborative robots.
arXiv Detail & Related papers (2022-05-15T15:51:05Z) - Learning Perceptual Concepts by Bootstrapping from Human Queries [41.07749131023931]
We propose a new approach whereby the robot learns a low-dimensional variant of the concept and uses it to generate a larger data set for learning the concept in the high-dimensional space.
This lets it take advantage of semantically meaningful privileged information only accessible at training time, like object poses and bounding boxes, that allows for richer human interaction to speed up learning.
arXiv Detail & Related papers (2021-11-09T16:43:46Z) - V-MAO: Generative Modeling for Multi-Arm Manipulation of Articulated
Objects [51.79035249464852]
We present a framework for learning multi-arm manipulation of articulated objects.
Our framework includes a variational generative model that learns contact point distribution over object rigid parts for each robot arm.
arXiv Detail & Related papers (2021-11-07T02:31:09Z) - Learning to Regrasp by Learning to Place [19.13976401970985]
Regrasping is needed when a robot's current grasp pose fails to perform desired manipulation tasks.
We propose a system for robots to take partial point clouds of an object and the supporting environment as inputs and output a sequence of pick-and-place operations.
We show that our system is able to achieve 73.3% success rate of regrasping diverse objects.
arXiv Detail & Related papers (2021-09-18T03:07:06Z) - Learning Language-Conditioned Robot Behavior from Offline Data and
Crowd-Sourced Annotation [80.29069988090912]
We study the problem of learning a range of vision-based manipulation tasks from a large offline dataset of robot interaction.
We propose to leverage offline robot datasets with crowd-sourced natural language labels.
We find that our approach outperforms both goal-image specifications and language conditioned imitation techniques by more than 25%.
arXiv Detail & Related papers (2021-09-02T17:42:13Z) - Predicting Stable Configurations for Semantic Placement of Novel Objects [37.18437299513799]
Our goal is to enable robots to repose previously unseen objects according to learned semantic relationships in novel environments.
We build our models and training from the ground up to be tightly integrated with our proposed planning algorithm for semantic placement of unknown objects.
Our approach enables motion planning for semantic rearrangement of unknown objects in scenes with varying geometry from only RGB-D sensing.
arXiv Detail & Related papers (2021-08-26T23:05:05Z) - INVIGORATE: Interactive Visual Grounding and Grasping in Clutter [56.00554240240515]
INVIGORATE is a robot system that interacts with human through natural language and grasps a specified object in clutter.
We train separate neural networks for object detection, for visual grounding, for question generation, and for OBR detection and grasping.
We build a partially observable Markov decision process (POMDP) that integrates the learned neural network modules.
arXiv Detail & Related papers (2021-08-25T07:35:21Z) - NeRP: Neural Rearrangement Planning for Unknown Objects [49.191284597526]
We propose NeRP (Neural Rearrangement Planning), a deep learning based approach for multi-step neural object rearrangement planning.
NeRP works with never-before-seen objects, that is trained on simulation data, and generalizes to the real world.
arXiv Detail & Related papers (2021-06-02T17:56:27Z) - Sensorimotor representation learning for an "active self" in robots: A
model survey [10.649413494649293]
In humans, these capabilities are thought to be related to our ability to perceive our body in space.
This paper reviews the developmental processes of underlying mechanisms of these abilities.
We propose a theoretical computational framework, which aims to allow the emergence of the sense of self in artificial agents.
arXiv Detail & Related papers (2020-11-25T16:31:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.