Controlling diverse robots by inferring Jacobian fields with deep networks
- URL: http://arxiv.org/abs/2407.08722v2
- Date: Wed, 30 Jul 2025 21:44:23 GMT
- Title: Controlling diverse robots by inferring Jacobian fields with deep networks
- Authors: Sizhe Lester Li, Annan Zhang, Boyuan Chen, Hanna Matusik, Chao Liu, Daniela Rus, Vincent Sitzmann,
- Abstract summary: Mirroring the complex structures and diverse functions of natural organisms is a long-standing challenge in robotics.<n>We introduce a method that uses deep neural networks to map a video stream of a robot to its visuomotor Jacobian field.<n>Our approach achieves accurate closed-loop control and recovers the causal dynamic structure of each robot.
- Score: 48.279199537720714
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Mirroring the complex structures and diverse functions of natural organisms is a long-standing challenge in robotics. Modern fabrication techniques have greatly expanded the feasible hardware, but using these systems requires control software to translate the desired motions into actuator commands. Conventional robots can easily be modeled as rigid links connected by joints, but it remains an open challenge to model and control biologically inspired robots that are often soft or made of several materials, lack sensing capabilities, and may change their material properties with use. Here, we introduce a method that uses deep neural networks to map a video stream of a robot to its visuomotor Jacobian field (the sensitivity of all 3D points to the robot's actuators). Our method enables the control of robots from only a single camera, makes no assumptions about the robots' materials, actuation, or sensing, and is trained without expert intervention by observing the execution of random commands. We demonstrate our method on a diverse set of robot manipulators that vary in actuation, materials, fabrication, and cost. Our approach achieves accurate closed-loop control and recovers the causal dynamic structure of each robot. Because it enables robot control using a generic camera as the only sensor, we anticipate that our work will broaden the design space of robotic systems and serve as a starting point for lowering the barrier to robotic automation.
Related papers
- $π_0$: A Vision-Language-Action Flow Model for General Robot Control [77.32743739202543]
We propose a novel flow matching architecture built on top of a pre-trained vision-language model (VLM) to inherit Internet-scale semantic knowledge.
We evaluate our model in terms of its ability to perform tasks in zero shot after pre-training, follow language instructions from people, and its ability to acquire new skills via fine-tuning.
arXiv Detail & Related papers (2024-10-31T17:22:30Z) - RoboScript: Code Generation for Free-Form Manipulation Tasks across Real
and Simulation [77.41969287400977]
This paper presents textbfRobotScript, a platform for a deployable robot manipulation pipeline powered by code generation.
We also present a benchmark for a code generation benchmark for robot manipulation tasks in free-form natural language.
We demonstrate the adaptability of our code generation framework across multiple robot embodiments, including the Franka and UR5 robot arms.
arXiv Detail & Related papers (2024-02-22T15:12:00Z) - Correspondence learning between morphologically different robots via
task demonstrations [2.1374208474242815]
We propose a method to learn correspondences among two or more robots that may have different morphologies.
A fixed-based manipulator robot with joint control and a differential drive mobile robot can be addressed within the proposed framework.
We provide a proof-of-the-concept realization of correspondence learning between a real manipulator robot and a simulated mobile robot.
arXiv Detail & Related papers (2023-10-20T12:42:06Z) - Giving Robots a Hand: Learning Generalizable Manipulation with
Eye-in-Hand Human Video Demonstrations [66.47064743686953]
Eye-in-hand cameras have shown promise in enabling greater sample efficiency and generalization in vision-based robotic manipulation.
Videos of humans performing tasks, on the other hand, are much cheaper to collect since they eliminate the need for expertise in robotic teleoperation.
In this work, we augment narrow robotic imitation datasets with broad unlabeled human video demonstrations to greatly enhance the generalization of eye-in-hand visuomotor policies.
arXiv Detail & Related papers (2023-07-12T07:04:53Z) - HERD: Continuous Human-to-Robot Evolution for Learning from Human
Demonstration [57.045140028275036]
We show that manipulation skills can be transferred from a human to a robot through the use of micro-evolutionary reinforcement learning.
We propose an algorithm for multi-dimensional evolution path searching that allows joint optimization of both the robot evolution path and the policy.
arXiv Detail & Related papers (2022-12-08T15:56:13Z) - GenLoco: Generalized Locomotion Controllers for Quadrupedal Robots [87.32145104894754]
We introduce a framework for training generalized locomotion (GenLoco) controllers for quadrupedal robots.
Our framework synthesizes general-purpose locomotion controllers that can be deployed on a large variety of quadrupedal robots.
We show that our models acquire more general control strategies that can be directly transferred to novel simulated and real-world robots.
arXiv Detail & Related papers (2022-09-12T15:14:32Z) - A Transferable Legged Mobile Manipulation Framework Based on Disturbance
Predictive Control [15.044159090957292]
Legged mobile manipulation, where a quadruped robot is equipped with a robotic arm, can greatly enhance the performance of the robot.
We propose a unified framework disturbance predictive control where a reinforcement learning scheme with a latent dynamic adapter is embedded into our proposed low-level controller.
arXiv Detail & Related papers (2022-03-02T14:54:10Z) - Robotic Telekinesis: Learning a Robotic Hand Imitator by Watching Humans
on Youtube [24.530131506065164]
We build a system that enables any human to control a robot hand and arm, simply by demonstrating motions with their own hand.
The robot observes the human operator via a single RGB camera and imitates their actions in real-time.
We leverage this data to train a system that understands human hands and retargets a human video stream into a robot hand-arm trajectory that is smooth, swift, safe, and semantically similar to the guiding demonstration.
arXiv Detail & Related papers (2022-02-21T18:59:59Z) - Know Thyself: Transferable Visuomotor Control Through Robot-Awareness [22.405839096833937]
Training visuomotor robot controllers from scratch on a new robot typically requires generating large amounts of robot-specific data.
We propose a "robot-aware" solution paradigm that exploits readily available robot "self-knowledge"
Our experiments on tabletop manipulation tasks in simulation and on real robots demonstrate that these plug-in improvements dramatically boost the transferability of visuomotor controllers.
arXiv Detail & Related papers (2021-07-19T17:56:04Z) - Single-view robot pose and joint angle estimation via render & compare [40.05546237998603]
We introduce RoboPose, a method to estimate the joint angles and the 6D camera-to-robot pose of a known articulated robot from a single RGB image.
This is an important problem to grant mobile and itinerant autonomous systems the ability to interact with other robots.
arXiv Detail & Related papers (2021-04-19T14:48:29Z) - Morphology-Agnostic Visual Robotic Control [76.44045983428701]
MAVRIC is an approach that works with minimal prior knowledge of the robot's morphology.
We demonstrate our method on visually-guided 3D point reaching, trajectory following, and robot-to-robot imitation.
arXiv Detail & Related papers (2019-12-31T15:45:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.