Speech to Reality: On-Demand Production using Natural Language, 3D Generative AI, and Discrete Robotic Assembly
- URL: http://arxiv.org/abs/2409.18390v2
- Date: Tue, 22 Oct 2024 04:16:23 GMT
- Title: Speech to Reality: On-Demand Production using Natural Language, 3D Generative AI, and Discrete Robotic Assembly
- Authors: Alexander Htet Kyaw, Se Hwan Jeon, Miana Smith, Neil Gershenfeld,
- Abstract summary: We present a system that transforms speech into physical objects by combining 3D generative Artificial Intelligence with robotic assembly.
We propose utilizing discrete robotic assembly of lattice-based voxel components to address the challenges of using generative AI outputs in physical production.
- Score: 45.03625198933637
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a system that transforms speech into physical objects by combining 3D generative Artificial Intelligence with robotic assembly. The system leverages natural language input to make design and manufacturing more accessible, enabling individuals without expertise in 3D modeling or robotic programming to create physical objects. We propose utilizing discrete robotic assembly of lattice-based voxel components to address the challenges of using generative AI outputs in physical production, such as design variability, fabrication speed, structural integrity, and material waste. The system interprets speech to generate 3D objects, discretizes them into voxel components, computes an optimized assembly sequence, and generates a robotic toolpath. The results are demonstrated through the assembly of various objects, ranging from chairs to shelves, which are prompted via speech and realized within 5 minutes using a 6-axis robotic arm.
Related papers
- Blox-Net: Generative Design-for-Robot-Assembly Using VLM Supervision, Physics Simulation, and a Robot with Reset [22.660709957291747]
We introduce a novel problem: Generative Design-for-Robot-Assembly (GDfRA)
The task is to generate an assembly based on a natural language prompt (e.g., ''giraffe'') and an image of available physical components, such as 3D-printed blocks.
We present Blox-Net, a GDfRA system that combines generative vision language models with well-established methods in computer vision, perturbation analysis, motion planning, and physical robot experimentation.
arXiv Detail & Related papers (2024-09-25T17:42:20Z) - SUGAR: Pre-training 3D Visual Representations for Robotics [85.55534363501131]
We introduce a novel 3D pre-training framework for robotics named SUGAR.
SUGAR captures semantic, geometric and affordance properties of objects through 3D point clouds.
We show that SUGAR's 3D representation outperforms state-of-the-art 2D and 3D representations.
arXiv Detail & Related papers (2024-04-01T21:23:03Z) - PhyGrasp: Generalizing Robotic Grasping with Physics-informed Large
Multimodal Models [58.33913881592706]
Humans can easily apply their intuitive physics to grasp skillfully and change grasps efficiently, even for objects they have never seen before.
This work delves into infusing such physical commonsense reasoning into robotic manipulation.
We introduce PhyGrasp, a multimodal large model that leverages inputs from two modalities: natural language and 3D point clouds.
arXiv Detail & Related papers (2024-02-26T18:57:52Z) - RoboScript: Code Generation for Free-Form Manipulation Tasks across Real
and Simulation [77.41969287400977]
This paper presents textbfRobotScript, a platform for a deployable robot manipulation pipeline powered by code generation.
We also present a benchmark for a code generation benchmark for robot manipulation tasks in free-form natural language.
We demonstrate the adaptability of our code generation framework across multiple robot embodiments, including the Franka and UR5 robot arms.
arXiv Detail & Related papers (2024-02-22T15:12:00Z) - WALL-E: Embodied Robotic WAiter Load Lifting with Large Language Model [92.90127398282209]
This paper investigates the potential of integrating the most recent Large Language Models (LLMs) and existing visual grounding and robotic grasping system.
We introduce the WALL-E (Embodied Robotic WAiter load lifting with Large Language model) as an example of this integration.
We deploy this LLM-empowered system on the physical robot to provide a more user-friendly interface for the instruction-guided grasping task.
arXiv Detail & Related papers (2023-08-30T11:35:21Z) - Learning 6-DoF Fine-grained Grasp Detection Based on Part Affordance Grounding [42.04502185508723]
We propose a new large Language-guided SHape grAsPing datasEt to promote 3D part-level affordance and grasping ability learning.
From the perspective of robotic cognition, we design a two-stage fine-grained robotic grasping framework (named LangPartGPD)
Our method combines the advantages of human-robot collaboration and large language models (LLMs)
Results show our method achieves competitive performance in 3D geometry fine-grained grounding, object affordance inference, and 3D part-aware grasping tasks.
arXiv Detail & Related papers (2023-01-27T07:00:54Z) - LaTTe: Language Trajectory TransformEr [33.7939079214046]
This work proposes a flexible language-based framework to modify generic 3D robotic trajectories.
We employ an auto-regressive transformer to map natural language inputs and contextual images into changes in 3D trajectories.
We show through simulations and real-life experiments that the model can successfully follow human intent.
arXiv Detail & Related papers (2022-08-04T22:43:21Z) - What Matters in Language Conditioned Robotic Imitation Learning [26.92329260907805]
We study the most critical challenges in learning language conditioned policies from offline free-form imitation datasets.
We present a novel approach that significantly outperforms the state of the art on the challenging language conditioned long-horizon robot manipulation CALVIN benchmark.
arXiv Detail & Related papers (2022-04-13T08:45:32Z) - V-MAO: Generative Modeling for Multi-Arm Manipulation of Articulated
Objects [51.79035249464852]
We present a framework for learning multi-arm manipulation of articulated objects.
Our framework includes a variational generative model that learns contact point distribution over object rigid parts for each robot arm.
arXiv Detail & Related papers (2021-11-07T02:31:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.