Blox-Net: Generative Design-for-Robot-Assembly Using VLM Supervision, Physics Simulation, and a Robot with Reset
- URL: http://arxiv.org/abs/2409.17126v1
- Date: Wed, 25 Sep 2024 17:42:20 GMT
- Title: Blox-Net: Generative Design-for-Robot-Assembly Using VLM Supervision, Physics Simulation, and a Robot with Reset
- Authors: Andrew Goldberg, Kavish Kondap, Tianshuang Qiu, Zehan Ma, Letian Fu, Justin Kerr, Huang Huang, Kaiyuan Chen, Kuan Fang, Ken Goldberg,
- Abstract summary: We introduce a novel problem: Generative Design-for-Robot-Assembly (GDfRA)
The task is to generate an assembly based on a natural language prompt (e.g., ''giraffe'') and an image of available physical components, such as 3D-printed blocks.
We present Blox-Net, a GDfRA system that combines generative vision language models with well-established methods in computer vision, perturbation analysis, motion planning, and physical robot experimentation.
- Score: 22.660709957291747
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generative AI systems have shown impressive capabilities in creating text, code, and images. Inspired by the rich history of research in industrial ''Design for Assembly'', we introduce a novel problem: Generative Design-for-Robot-Assembly (GDfRA). The task is to generate an assembly based on a natural language prompt (e.g., ''giraffe'') and an image of available physical components, such as 3D-printed blocks. The output is an assembly, a spatial arrangement of these components, and instructions for a robot to build this assembly. The output must 1) resemble the requested object and 2) be reliably assembled by a 6 DoF robot arm with a suction gripper. We then present Blox-Net, a GDfRA system that combines generative vision language models with well-established methods in computer vision, simulation, perturbation analysis, motion planning, and physical robot experimentation to solve a class of GDfRA problems with minimal human supervision. Blox-Net achieved a Top-1 accuracy of 63.5% in the ''recognizability'' of its designed assemblies (eg, resembling giraffe as judged by a VLM). These designs, after automated perturbation redesign, were reliably assembled by a robot, achieving near-perfect success across 10 consecutive assembly iterations with human intervention only during reset prior to assembly. Surprisingly, this entire design process from textual word (''giraffe'') to reliable physical assembly is performed with zero human intervention.
Related papers
- AssemblyComplete: 3D Combinatorial Construction with Deep Reinforcement Learning [4.3507834596906125]
A critical goal in robotics is to teach robots to adapt to real-world collaborative tasks, particularly in automatic assembly.
This paper introduces 3D assembly completion, which is demonstrated using unit primitives (i.e., Lego bricks)
We propose a two-part deep reinforcement learning (DRL) framework that tackles teaching the robot to understand the objective of an incomplete assembly and learning a construction policy to complete the assembly.
arXiv Detail & Related papers (2024-10-20T18:51:17Z) - Speech to Reality: On-Demand Production using Natural Language, 3D Generative AI, and Discrete Robotic Assembly [45.03625198933637]
We present a system that transforms speech into physical objects by combining 3D generative Artificial Intelligence with robotic assembly.
We propose utilizing discrete robotic assembly of lattice-based voxel components to address the challenges of using generative AI outputs in physical production.
arXiv Detail & Related papers (2024-09-27T02:12:56Z) - GRUtopia: Dream General Robots in a City at Scale [65.08318324604116]
This paper introduces project GRUtopia, the first simulated interactive 3D society designed for various robots.
GRScenes includes 100k interactive, finely annotated scenes, which can be freely combined into city-scale environments.
GRResidents is a Large Language Model (LLM) driven Non-Player Character (NPC) system that is responsible for social interaction.
arXiv Detail & Related papers (2024-07-15T17:40:46Z) - Unifying 3D Representation and Control of Diverse Robots with a Single Camera [48.279199537720714]
We introduce Neural Jacobian Fields, an architecture that autonomously learns to model and control robots from vision alone.
Our approach achieves accurate closed-loop control and recovers the causal dynamic structure of each robot.
arXiv Detail & Related papers (2024-07-11T17:55:49Z) - RoboScript: Code Generation for Free-Form Manipulation Tasks across Real
and Simulation [77.41969287400977]
This paper presents textbfRobotScript, a platform for a deployable robot manipulation pipeline powered by code generation.
We also present a benchmark for a code generation benchmark for robot manipulation tasks in free-form natural language.
We demonstrate the adaptability of our code generation framework across multiple robot embodiments, including the Franka and UR5 robot arms.
arXiv Detail & Related papers (2024-02-22T15:12:00Z) - WALL-E: Embodied Robotic WAiter Load Lifting with Large Language Model [92.90127398282209]
This paper investigates the potential of integrating the most recent Large Language Models (LLMs) and existing visual grounding and robotic grasping system.
We introduce the WALL-E (Embodied Robotic WAiter load lifting with Large Language model) as an example of this integration.
We deploy this LLM-empowered system on the physical robot to provide a more user-friendly interface for the instruction-guided grasping task.
arXiv Detail & Related papers (2023-08-30T11:35:21Z) - Graph-based Reinforcement Learning meets Mixed Integer Programs: An
application to 3D robot assembly discovery [34.25379651790627]
We tackle the problem of building arbitrary, predefined target structures entirely from scratch using a set of Tetris-like building blocks and a robotic manipulator.
Our novel hierarchical approach aims at efficiently decomposing the overall task into three feasible levels that benefit mutually from each other.
arXiv Detail & Related papers (2022-03-08T14:44:51Z) - V-MAO: Generative Modeling for Multi-Arm Manipulation of Articulated
Objects [51.79035249464852]
We present a framework for learning multi-arm manipulation of articulated objects.
Our framework includes a variational generative model that learns contact point distribution over object rigid parts for each robot arm.
arXiv Detail & Related papers (2021-11-07T02:31:09Z) - Single-view robot pose and joint angle estimation via render & compare [40.05546237998603]
We introduce RoboPose, a method to estimate the joint angles and the 6D camera-to-robot pose of a known articulated robot from a single RGB image.
This is an important problem to grant mobile and itinerant autonomous systems the ability to interact with other robots.
arXiv Detail & Related papers (2021-04-19T14:48:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.