GoferBot: A Visual Guided Human-Robot Collaborative Assembly System
- URL: http://arxiv.org/abs/2304.08840v2
- Date: Wed, 17 May 2023 07:28:28 GMT
- Title: GoferBot: A Visual Guided Human-Robot Collaborative Assembly System
- Authors: Zheyu Zhuang, Yizhak Ben-Shabat, Jiahao Zhang, Stephen Gould, Robert
Mahony
- Abstract summary: GoferBot is a novel vision-based semantic HRC system for a real-world assembly task.
GoferBot is a novel assembly system that seamlessly integrates all sub-modules by utilising implicit semantic information purely from visual perception.
- Score: 33.649596318580215
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The current transformation towards smart manufacturing has led to a growing
demand for human-robot collaboration (HRC) in the manufacturing process.
Perceiving and understanding the human co-worker's behaviour introduces
challenges for collaborative robots to efficiently and effectively perform
tasks in unstructured and dynamic environments. Integrating recent data-driven
machine vision capabilities into HRC systems is a logical next step in
addressing these challenges. However, in these cases, off-the-shelf components
struggle due to generalisation limitations. Real-world evaluation is required
in order to fully appreciate the maturity and robustness of these approaches.
Furthermore, understanding the pure-vision aspects is a crucial first step
before combining multiple modalities in order to understand the limitations. In
this paper, we propose GoferBot, a novel vision-based semantic HRC system for a
real-world assembly task. It is composed of a visual servoing module that
reaches and grasps assembly parts in an unstructured multi-instance and dynamic
environment, an action recognition module that performs human action prediction
for implicit communication, and a visual handover module that uses the
perceptual understanding of human behaviour to produce an intuitive and
efficient collaborative assembly experience. GoferBot is a novel assembly
system that seamlessly integrates all sub-modules by utilising implicit
semantic information purely from visual perception.
Related papers
- Learning Manipulation by Predicting Interaction [85.57297574510507]
We propose a general pre-training pipeline that learns Manipulation by Predicting the Interaction.
The experimental results demonstrate that MPI exhibits remarkable improvement by 10% to 64% compared with previous state-of-the-art in real-world robot platforms.
arXiv Detail & Related papers (2024-06-01T13:28:31Z) - Extended Reality for Enhanced Human-Robot Collaboration: a Human-in-the-Loop Approach [2.336967926255341]
Human-robot collaboration attempts to tackle these challenges by combining the strength and precision of machines with human ingenuity and perceptual understanding.
We propose an implementation framework for an autonomous, machine learning-based manipulator that incorporates human-in-the-loop principles.
The conceptual framework foresees human involvement directly in the robot learning process, resulting in higher adaptability and task generalization.
arXiv Detail & Related papers (2024-03-21T17:50:22Z) - RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis [102.1876259853457]
We propose a tree-structured multimodal code generation framework for generalized robotic behavior synthesis, termed RoboCodeX.
RoboCodeX decomposes high-level human instructions into multiple object-centric manipulation units consisting of physical preferences such as affordance and safety constraints.
To further enhance the capability to map conceptual and perceptual understanding into control commands, a specialized multimodal reasoning dataset is collected for pre-training and an iterative self-updating methodology is introduced for supervised fine-tuning.
arXiv Detail & Related papers (2024-02-25T15:31:43Z) - QUAR-VLA: Vision-Language-Action Model for Quadruped Robots [37.952398683031895]
The central idea is to elevate the overall intelligence of the robot.
We propose QUAdruped Robotic Transformer (QUART), a family of VLA models to integrate visual information and instructions from diverse modalities as input.
Our approach leads to performant robotic policies and enables QUART to obtain a range of emergent capabilities.
arXiv Detail & Related papers (2023-12-22T06:15:03Z) - Robot Skill Generalization via Keypoint Integrated Soft Actor-Critic
Gaussian Mixture Models [21.13906762261418]
A long-standing challenge for a robotic manipulation system is adapting and generalizing its acquired motor skills to unseen environments.
We tackle this challenge employing hybrid skill models that integrate imitation and reinforcement paradigms.
We show that our method enables a robot to gain a significant zero-shot generalization to novel environments and to refine skills in the target environments faster than learning from scratch.
arXiv Detail & Related papers (2023-10-23T16:03:23Z) - Online Learning and Planning in Cognitive Hierarchies [10.28577981317938]
We extend an existing formal framework to model complex integrated reasoning behaviours of robotic systems.
New framework allows for a more flexible modelling of the interactions between different reasoning components.
arXiv Detail & Related papers (2023-10-18T23:53:51Z) - Unified Human-Scene Interaction via Prompted Chain-of-Contacts [61.87652569413429]
Human-Scene Interaction (HSI) is a vital component of fields like embodied AI and virtual reality.
This paper presents a unified HSI framework, UniHSI, which supports unified control of diverse interactions through language commands.
arXiv Detail & Related papers (2023-09-14T17:59:49Z) - Incremental procedural and sensorimotor learning in cognitive humanoid
robots [52.77024349608834]
This work presents a cognitive agent that can learn procedures incrementally.
We show the cognitive functions required in each substage and how adding new functions helps address tasks previously unsolved by the agent.
Results show that this approach is capable of solving complex tasks incrementally.
arXiv Detail & Related papers (2023-04-30T22:51:31Z) - Dexterous Manipulation from Images: Autonomous Real-World RL via Substep
Guidance [71.36749876465618]
We describe a system for vision-based dexterous manipulation that provides a "programming-free" approach for users to define new tasks.
Our system includes a framework for users to define a final task and intermediate sub-tasks with image examples.
experimental results with a four-finger robotic hand learning multi-stage object manipulation tasks directly in the real world.
arXiv Detail & Related papers (2022-12-19T22:50:40Z) - BC-Z: Zero-Shot Task Generalization with Robotic Imitation Learning [108.41464483878683]
We study the problem of enabling a vision-based robotic manipulation system to generalize to novel tasks.
We develop an interactive and flexible imitation learning system that can learn from both demonstrations and interventions.
When scaling data collection on a real robot to more than 100 distinct tasks, we find that this system can perform 24 unseen manipulation tasks with an average success rate of 44%.
arXiv Detail & Related papers (2022-02-04T07:30:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.