Related papers: Instructing Robots by Sketching: Learning from Demonstration via Probabilistic Diagrammatic Teaching

Instructing Robots by Sketching: Learning from Demonstration via Probabilistic Diagrammatic Teaching

URL: http://arxiv.org/abs/2309.03835v3
Date: Sun, 31 Mar 2024 07:53:19 GMT
Title: Instructing Robots by Sketching: Learning from Demonstration via Probabilistic Diagrammatic Teaching
Authors: Weiming Zhi, Tianyi Zhang, Matthew Johnson-Roberson,
Abstract summary: Learning for Demonstration (LfD) enables robots to imitate expert demonstrations, allowing users to communicate their instructions in an intuitive manner. Recent progress in LfD often relies on kinesthetic teaching or teleoperation as the medium for users to specify the demonstrations. This paper introduces an alternative paradigm for LfD called Diagrammatic Teaching.
Score: 14.839036866911089
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Learning for Demonstration (LfD) enables robots to acquire new skills by imitating expert demonstrations, allowing users to communicate their instructions in an intuitive manner. Recent progress in LfD often relies on kinesthetic teaching or teleoperation as the medium for users to specify the demonstrations. Kinesthetic teaching requires physical handling of the robot, while teleoperation demands proficiency with additional hardware. This paper introduces an alternative paradigm for LfD called Diagrammatic Teaching. Diagrammatic Teaching aims to teach robots novel skills by prompting the user to sketch out demonstration trajectories on 2D images of the scene, these are then synthesised as a generative model of motion trajectories in 3D task space. Additionally, we present the Ray-tracing Probabilistic Trajectory Learning (RPTL) framework for Diagrammatic Teaching. RPTL extracts time-varying probability densities from the 2D sketches, applies ray-tracing to find corresponding regions in 3D Cartesian space, and fits a probabilistic model of motion trajectories to these regions. New motion trajectories, which mimic those sketched by the user, can then be generated from the probabilistic model. We empirically validate our framework both in simulation and on real robots, which include a fixed-base manipulator and a quadruped-mounted manipulator.

Related papers

DynaRend: Learning 3D Dynamics via Masked Future Rendering for Robotic Manipulation [52.136378691610524]
We present DynaRend, a representation learning framework that learns 3D-aware and dynamics-informed triplane features.<n>By pretraining on multi-view RGB-D video data, DynaRend jointly captures spatial geometry, future dynamics, and task semantics in a unified triplane representation.<n>We evaluate DynaRend on two challenging benchmarks, RLBench and Colosseum, demonstrating substantial improvements in policy success rate, generalization to environmental perturbations, and real-world applicability across diverse manipulation tasks.
arXiv Detail & Related papers (2025-10-28T10:17:11Z)
Cross-Modal Instructions for Robot Motion Generation [7.445072780282545]
We introduce Learning from Cross-Modal Instructions, where robots are shaped by demonstrations in the form of rough annotations.<n>We introduce the CrossInstruct framework, which integrates cross-modal instructions as examples into the context input to a vision-language model.<n>The VLM then iteratively queries a smaller, fine-tuned model, and synthesizes the desired motion over multiple 2D views.<n>By incorporating the reasoning of the large VLM with a fine-grained pointing model, CrossInstruct produces executable robot behaviors that generalize beyond the environment of in the limited set of instruction examples.
arXiv Detail & Related papers (2025-09-25T12:54:00Z)
Sketch-to-Skill: Bootstrapping Robot Learning with Human Drawn Trajectory Sketches [12.643638347912624]
Training robotic manipulation policies traditionally requires numerous demonstrations and/or environmental rollouts. We propose Sketch-to-Skill, a novel framework that leverages human-drawn 2D sketch trajectories to bootstrap and guide RL for robotic manipulation.
arXiv Detail & Related papers (2025-03-14T23:08:29Z)
VidBot: Learning Generalizable 3D Actions from In-the-Wild 2D Human Videos for Zero-Shot Robotic Manipulation [53.63540587160549]
VidBot is a framework enabling zero-shot robotic manipulation using learned 3D affordance from in-the-wild monocular RGB-only human videos. VidBot paves the way for leveraging everyday human videos to make robot learning more scalable.
arXiv Detail & Related papers (2025-03-10T10:04:58Z)
DEL: Discrete Element Learner for Learning 3D Particle Dynamics with Neural Rendering [10.456618054473177]
We show how to learn 3D dynamics from 2D images by inverse rendering. We incorporate the learnable graph kernels into the classic Discrete Element Analysis framework. Our methods can effectively learn the dynamics of various materials from the partial 2D observations.
arXiv Detail & Related papers (2024-10-11T16:57:02Z)
Robot See Robot Do: Imitating Articulated Object Manipulation with Monocular 4D Reconstruction [51.49400490437258]
This work develops a method for imitating articulated object manipulation from a single monocular RGB human demonstration. We first propose 4D Differentiable Part Models (4D-DPM), a method for recovering 3D part motion from a monocular video. Given this 4D reconstruction, the robot replicates object trajectories by planning bimanual arm motions that induce the demonstrated object part motion. We evaluate 4D-DPM's 3D tracking accuracy on ground truth annotated 3D part trajectories and RSRD's physical execution performance on 9 objects across 10 trials each on a bimanual YuMi robot.
arXiv Detail & Related papers (2024-09-26T17:57:16Z)
SUGAR: Pre-training 3D Visual Representations for Robotics [85.55534363501131]
We introduce a novel 3D pre-training framework for robotics named SUGAR. SUGAR captures semantic, geometric and affordance properties of objects through 3D point clouds. We show that SUGAR's 3D representation outperforms state-of-the-art 2D and 3D representations.
arXiv Detail & Related papers (2024-04-01T21:23:03Z)
Learning Orbitally Stable Systems for Diagrammatically Teaching [14.839036866911089]
Diagrammatic Teaching is a paradigm for robots to acquire novel skills, whereby the user provides 2D sketches over images of the scene to shape the robot's motion. In this work, we tackle the problem of teaching a robot to approach a surface and then follow cyclic motion on it, where the cycle of the motion can be arbitrarily specified by a single user-provided sketch over an image from the robot's camera.
arXiv Detail & Related papers (2023-09-19T04:03:42Z)
Visual Affordance Prediction for Guiding Robot Exploration [56.17795036091848]
We develop an approach for learning visual affordances for guiding robot exploration. We use a Transformer-based model to learn a conditional distribution in the latent embedding space of a VQ-VAE. We show how the trained affordance model can be used for guiding exploration by acting as a goal-sampling distribution, during visual goal-conditioned policy learning in robotic manipulation.
arXiv Detail & Related papers (2023-05-28T17:53:09Z)
NSLF-OL: Online Learning of Neural Surface Light Fields alongside Real-time Incremental 3D Reconstruction [0.76146285961466]
The paper proposes a novel Neural Surface Light Fields model that copes with the small range of view directions while producing a good result in unseen directions. Our model learns online the Neural Surface Light Fields (NSLF) aside from real-time 3D reconstruction with a sequential data stream as the shared input. In addition to online training, our model also provides real-time rendering after completing the data stream for visualization.
arXiv Detail & Related papers (2023-04-29T15:41:15Z)
Silver-Bullet-3D at ManiSkill 2021: Learning-from-Demonstrations and Heuristic Rule-based Methods for Object Manipulation [118.27432851053335]
This paper presents an overview and comparative analysis of our systems designed for the following two tracks in SAPIEN ManiSkill Challenge 2021: No Interaction Track. The No Interaction track targets for learning policies from pre-collected demonstration trajectories. In this track, we design a Heuristic Rule-based Method (HRM) to trigger high-quality object manipulation by decomposing the task into a series of sub-tasks. For each sub-task, the simple rule-based controlling strategies are adopted to predict actions that can be applied to robotic arms.
arXiv Detail & Related papers (2022-06-13T16:20:42Z)
Unsupervised Learning of Efficient Geometry-Aware Neural Articulated Representations [89.1388369229542]
We propose an unsupervised method for 3D geometry-aware representation learning of articulated objects. We obviate this need by learning the representations with GAN training. Experiments demonstrate the efficiency of our method and show that GAN-based training enables learning of controllable 3D representations without supervision.
arXiv Detail & Related papers (2022-04-19T12:10:18Z)
3D Neural Scene Representations for Visuomotor Control [78.79583457239836]
We learn models for dynamic 3D scenes purely from 2D visual observations. A dynamics model, constructed over the learned representation space, enables visuomotor control for challenging manipulation tasks.
arXiv Detail & Related papers (2021-07-08T17:49:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.