GoferBot: A Visual Guided Human-Robot Collaborative Assembly System
- URL: http://arxiv.org/abs/2304.08840v2
- Date: Wed, 17 May 2023 07:28:28 GMT
- Title: GoferBot: A Visual Guided Human-Robot Collaborative Assembly System
- Authors: Zheyu Zhuang, Yizhak Ben-Shabat, Jiahao Zhang, Stephen Gould, Robert
Mahony
- Abstract summary: GoferBot is a novel vision-based semantic HRC system for a real-world assembly task.
GoferBot is a novel assembly system that seamlessly integrates all sub-modules by utilising implicit semantic information purely from visual perception.
- Score: 33.649596318580215
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The current transformation towards smart manufacturing has led to a growing
demand for human-robot collaboration (HRC) in the manufacturing process.
Perceiving and understanding the human co-worker's behaviour introduces
challenges for collaborative robots to efficiently and effectively perform
tasks in unstructured and dynamic environments. Integrating recent data-driven
machine vision capabilities into HRC systems is a logical next step in
addressing these challenges. However, in these cases, off-the-shelf components
struggle due to generalisation limitations. Real-world evaluation is required
in order to fully appreciate the maturity and robustness of these approaches.
Furthermore, understanding the pure-vision aspects is a crucial first step
before combining multiple modalities in order to understand the limitations. In
this paper, we propose GoferBot, a novel vision-based semantic HRC system for a
real-world assembly task. It is composed of a visual servoing module that
reaches and grasps assembly parts in an unstructured multi-instance and dynamic
environment, an action recognition module that performs human action prediction
for implicit communication, and a visual handover module that uses the
perceptual understanding of human behaviour to produce an intuitive and
efficient collaborative assembly experience. GoferBot is a novel assembly
system that seamlessly integrates all sub-modules by utilising implicit
semantic information purely from visual perception.
Related papers
- Redefining Robot Generalization Through Interactive Intelligence [0.0]
We argue that robot foundation models must evolve to an interactive multi-agent perspective in order to handle the complexities of real-time human-robot co-adaptation.
By moving beyond single-agent designs, our position emphasizes how foundation models in robotics can achieve a more robust, personalized, and anticipatory level of performance.
arXiv Detail & Related papers (2025-02-09T17:13:27Z) - RefHCM: A Unified Model for Referring Perceptions in Human-Centric Scenarios [60.772871735598706]
RefHCM (Referring Human-Centric Model) is a framework to integrate a wide range of human-centric referring tasks.
RefHCM employs sequence mergers to convert raw multimodal data -- including images, text, coordinates, and parsing maps -- into semantic tokens.
This work represents the first attempt to address referring human perceptions with a general-purpose framework.
arXiv Detail & Related papers (2024-12-19T08:51:57Z) - One to rule them all: natural language to bind communication, perception and action [0.9302364070735682]
This paper presents an advanced architecture for robotic action planning that integrates communication, perception, and planning with Large Language Models (LLMs)
The Planner Module is the core of the system where LLMs embedded in a modified ReAct framework are employed to interpret and carry out user commands.
The modified ReAct framework further enhances the execution space by providing real-time environmental perception and the outcomes of physical actions.
arXiv Detail & Related papers (2024-11-22T16:05:54Z) - HARMONIC: A Framework for Explanatory Cognitive Robots [0.0]
We present HARMONIC, a framework for implementing cognitive robots.
The framework supports interoperability between a strategic (cognitive) layer for high-level decision-making and a tactical (robot) layer for low-level control and execution.
arXiv Detail & Related papers (2024-09-26T16:42:13Z) - Learning Manipulation by Predicting Interaction [85.57297574510507]
We propose a general pre-training pipeline that learns Manipulation by Predicting the Interaction.
The experimental results demonstrate that MPI exhibits remarkable improvement by 10% to 64% compared with previous state-of-the-art in real-world robot platforms.
arXiv Detail & Related papers (2024-06-01T13:28:31Z) - RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis [102.1876259853457]
We propose a tree-structured multimodal code generation framework for generalized robotic behavior synthesis, termed RoboCodeX.
RoboCodeX decomposes high-level human instructions into multiple object-centric manipulation units consisting of physical preferences such as affordance and safety constraints.
To further enhance the capability to map conceptual and perceptual understanding into control commands, a specialized multimodal reasoning dataset is collected for pre-training and an iterative self-updating methodology is introduced for supervised fine-tuning.
arXiv Detail & Related papers (2024-02-25T15:31:43Z) - QUAR-VLA: Vision-Language-Action Model for Quadruped Robots [37.952398683031895]
The central idea is to elevate the overall intelligence of the robot.
We propose QUAdruped Robotic Transformer (QUART), a family of VLA models to integrate visual information and instructions from diverse modalities as input.
Our approach leads to performant robotic policies and enables QUART to obtain a range of emergent capabilities.
arXiv Detail & Related papers (2023-12-22T06:15:03Z) - Robot Skill Generalization via Keypoint Integrated Soft Actor-Critic
Gaussian Mixture Models [21.13906762261418]
A long-standing challenge for a robotic manipulation system is adapting and generalizing its acquired motor skills to unseen environments.
We tackle this challenge employing hybrid skill models that integrate imitation and reinforcement paradigms.
We show that our method enables a robot to gain a significant zero-shot generalization to novel environments and to refine skills in the target environments faster than learning from scratch.
arXiv Detail & Related papers (2023-10-23T16:03:23Z) - Unified Human-Scene Interaction via Prompted Chain-of-Contacts [61.87652569413429]
Human-Scene Interaction (HSI) is a vital component of fields like embodied AI and virtual reality.
This paper presents a unified HSI framework, UniHSI, which supports unified control of diverse interactions through language commands.
arXiv Detail & Related papers (2023-09-14T17:59:49Z) - Incremental procedural and sensorimotor learning in cognitive humanoid
robots [52.77024349608834]
This work presents a cognitive agent that can learn procedures incrementally.
We show the cognitive functions required in each substage and how adding new functions helps address tasks previously unsolved by the agent.
Results show that this approach is capable of solving complex tasks incrementally.
arXiv Detail & Related papers (2023-04-30T22:51:31Z) - Dexterous Manipulation from Images: Autonomous Real-World RL via Substep
Guidance [71.36749876465618]
We describe a system for vision-based dexterous manipulation that provides a "programming-free" approach for users to define new tasks.
Our system includes a framework for users to define a final task and intermediate sub-tasks with image examples.
experimental results with a four-finger robotic hand learning multi-stage object manipulation tasks directly in the real world.
arXiv Detail & Related papers (2022-12-19T22:50:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.