Towards Natural Language-Driven Assembly Using Foundation Models
- URL: http://arxiv.org/abs/2406.16093v1
- Date: Sun, 23 Jun 2024 12:14:37 GMT
- Title: Towards Natural Language-Driven Assembly Using Foundation Models
- Authors: Omkar Joglekar, Tal Lancewicki, Shir Kozlovsky, Vladimir Tchuiev, Zohar Feldman, Dotan Di Castro,
- Abstract summary: Large Language Models (LLMs) and strong vision models have enabled rapid research and development in the field of Vision-Language-Action models.
We present a global control policy based on LLMs that can transfer the control policy to a finite set of skills that are specifically trained to perform high-precision tasks.
The integration of LLMs into this framework underscores their significance in not only interpreting and processing language inputs but also in enriching the control mechanisms for diverse and intricate robotic operations.
- Score: 11.710022685486914
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Large Language Models (LLMs) and strong vision models have enabled rapid research and development in the field of Vision-Language-Action models that enable robotic control. The main objective of these methods is to develop a generalist policy that can control robots with various embodiments. However, in industrial robotic applications such as automated assembly and disassembly, some tasks, such as insertion, demand greater accuracy and involve intricate factors like contact engagement, friction handling, and refined motor skills. Implementing these skills using a generalist policy is challenging because these policies might integrate further sensory data, including force or torque measurements, for enhanced precision. In our method, we present a global control policy based on LLMs that can transfer the control policy to a finite set of skills that are specifically trained to perform high-precision tasks through dynamic context switching. The integration of LLMs into this framework underscores their significance in not only interpreting and processing language inputs but also in enriching the control mechanisms for diverse and intricate robotic operations.
Related papers
- Robotic Control via Embodied Chain-of-Thought Reasoning [86.6680905262442]
Key limitation of learned robot control policies is their inability to generalize outside their training data.
Recent works on vision-language-action models (VLAs) have shown that the use of large, internet pre-trained vision-language models can substantially improve their robustness and generalization ability.
We introduce Embodied Chain-of-Thought Reasoning (ECoT) for VLAs, in which we train VLAs to perform multiple steps of reasoning about plans, sub-tasks, motions, and visually grounded features before predicting the robot action.
arXiv Detail & Related papers (2024-07-11T17:31:01Z) - LLaRA: Supercharging Robot Learning Data for Vision-Language Policy [56.505551117094534]
Large Language Models (LLMs) equipped with extensive world knowledge and strong reasoning skills can tackle diverse tasks across domains.
We propose LLaRA: Large Language and Robotics Assistant, a framework which formulates robot action policy as conversations.
arXiv Detail & Related papers (2024-06-28T17:59:12Z) - Empowering Large Language Models on Robotic Manipulation with Affordance Prompting [23.318449345424725]
Large language models fail to interact with the physical world by generating control sequences properly.
Existing LLM-based approaches circumvent this problem by relying on additional pre-defined skills or pre-trained sub-policies.
We propose a framework called LLM+A(ffordance) where the LLM serves as both the sub-task planner and the motion controller.
arXiv Detail & Related papers (2024-04-17T03:06:32Z) - InCoRo: In-Context Learning for Robotics Control with Feedback Loops [4.702566749969133]
InCoRo is a system that uses a classical robotic feedback loop composed of an LLM controller, a scene understanding unit, and a robot.
We highlight the generalization capabilities of our system and show that InCoRo surpasses the prior art in terms of the success rate.
This research paves the way towards building reliable, efficient, intelligent autonomous systems that adapt to dynamic environments.
arXiv Detail & Related papers (2024-02-07T19:01:11Z) - LanguageMPC: Large Language Models as Decision Makers for Autonomous
Driving [87.1164964709168]
This work employs Large Language Models (LLMs) as a decision-making component for complex autonomous driving scenarios.
Extensive experiments demonstrate that our proposed method not only consistently surpasses baseline approaches in single-vehicle tasks, but also helps handle complex driving behaviors even multi-vehicle coordination.
arXiv Detail & Related papers (2023-10-04T17:59:49Z) - Robotic Handling of Compliant Food Objects by Robust Learning from
Demonstration [79.76009817889397]
We propose a robust learning policy based on Learning from Demonstration (LfD) for robotic grasping of food compliant objects.
We present an LfD learning policy that automatically removes inconsistent demonstrations, and estimates the teacher's intended policy.
The proposed approach has a vast range of potential applications in the aforementioned industry sectors.
arXiv Detail & Related papers (2023-09-22T13:30:26Z) - LLM-Based Human-Robot Collaboration Framework for Manipulation Tasks [4.4589894340260585]
This paper presents a novel approach to enhance autonomous robotic manipulation using the Large Language Model (LLM) for logical inference.
The proposed system combines the advantage of LLM with YOLO-based environmental perception to enable robots to autonomously make reasonable decisions.
arXiv Detail & Related papers (2023-08-29T01:54:49Z) - Language to Rewards for Robotic Skill Synthesis [37.21434094015743]
We introduce a new paradigm that harnesses large language models (LLMs) to define reward parameters that can be optimized and accomplish variety of robotic tasks.
Using reward as the intermediate interface generated by LLMs, we can effectively bridge the gap between high-level language instructions or corrections to low-level robot actions.
arXiv Detail & Related papers (2023-06-14T17:27:10Z) - Transferring Foundation Models for Generalizable Robotic Manipulation [82.12754319808197]
We propose a novel paradigm that effectively leverages language-reasoning segmentation mask generated by internet-scale foundation models.
Our approach can effectively and robustly perceive object pose and enable sample-efficient generalization learning.
Demos can be found in our submitted video, and more comprehensive ones can be found in link1 or link2.
arXiv Detail & Related papers (2023-06-09T07:22:12Z) - Dexterous Manipulation from Images: Autonomous Real-World RL via Substep
Guidance [71.36749876465618]
We describe a system for vision-based dexterous manipulation that provides a "programming-free" approach for users to define new tasks.
Our system includes a framework for users to define a final task and intermediate sub-tasks with image examples.
experimental results with a four-finger robotic hand learning multi-stage object manipulation tasks directly in the real world.
arXiv Detail & Related papers (2022-12-19T22:50:40Z) - Deep Reinforcement Learning for Contact-Rich Skills Using Compliant
Movement Primitives [0.0]
Further integration of industrial robots is hampered by their limited flexibility, adaptability and decision making skills.
We propose different pruning methods that facilitate convergence and generalization.
We demonstrate that the proposed method can learn insertion skills that are invariant to space, size, shape, and closely related scenarios.
arXiv Detail & Related papers (2020-08-30T17:29:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.