AI-Powered Immersive Assistance for Interactive Task Execution in Industrial Environments
- URL: http://arxiv.org/abs/2407.09147v1
- Date: Fri, 12 Jul 2024 10:30:45 GMT
- Title: AI-Powered Immersive Assistance for Interactive Task Execution in Industrial Environments
- Authors: Tomislav Duricic, Peter Müllner, Nicole Weidinger, Neven ElSayed, Dominik Kowald, Eduardo Veas,
- Abstract summary: We demonstrate an AI-powered immersive assistance system that supports users in performing complex tasks in industrial environments.
Our system leverages a VR environment that resembles a juice mixer setup.
This demonstration showcases the potential of our AI-powered assistant to reduce cognitive load, increase productivity, and enhance safety in industrial environments.
- Score: 0.11545092788508222
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Many industrial sectors rely on well-trained employees that are able to operate complex machinery. In this work, we demonstrate an AI-powered immersive assistance system that supports users in performing complex tasks in industrial environments. Specifically, our system leverages a VR environment that resembles a juice mixer setup. This digital twin of a physical setup simulates complex industrial machinery used to mix preparations or liquids (e.g., similar to the pharmaceutical industry) and includes various containers, sensors, pumps, and flow controllers. This setup demonstrates our system's capabilities in a controlled environment while acting as a proof-of-concept for broader industrial applications. The core components of our multimodal AI assistant are a large language model and a speech-to-text model that process a video and audio recording of an expert performing the task in a VR environment. The video and speech input extracted from the expert's video enables it to provide step-by-step guidance to support users in executing complex tasks. This demonstration showcases the potential of our AI-powered assistant to reduce cognitive load, increase productivity, and enhance safety in industrial environments.
Related papers
- Bridging Industrial Expertise and XR with LLM-Powered Conversational Agents [2.526333884960815]
This paper introduces a novel integration of Retrieval-Augmented Generation (RAG) enhanced Large Language Models (LLMs) with Extended Reality (XR)
The proposed system embeds domain-specific industrial knowledge into XR environments through a natural language interface, enabling hands-free, context-aware expert guidance for workers.
arXiv Detail & Related papers (2025-04-07T22:02:19Z) - TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks [52.46737975742287]
We build a self-contained environment with data that mimics a small software company environment.
We find that with the most competitive agent, 24% of the tasks can be completed autonomously.
This paints a nuanced picture on task automation with LM agents.
arXiv Detail & Related papers (2024-12-18T18:55:40Z) - RAMPA: Robotic Augmented Reality for Machine Programming and Automation [4.963604518596734]
This paper introduces Robotic Augmented Reality for Machine Programming (RAMPA)
RAMPA is a system that utilizes the capabilities of state-of-the-art and commercially available AR headsets, e.g., Meta Quest 3.
Our approach enables in-situ data recording, visualization, and fine-tuning of skill demonstrations directly within the user's physical environment.
arXiv Detail & Related papers (2024-10-17T10:21:28Z) - An Interactive Agent Foundation Model [49.77861810045509]
We propose an Interactive Agent Foundation Model that uses a novel multi-task agent training paradigm for training AI agents.
Our training paradigm unifies diverse pre-training strategies, including visual masked auto-encoders, language modeling, and next-action prediction.
We demonstrate the performance of our framework across three separate domains -- Robotics, Gaming AI, and Healthcare.
arXiv Detail & Related papers (2024-02-08T18:58:02Z) - Agent AI: Surveying the Horizons of Multimodal Interaction [83.18367129924997]
"Agent AI" is a class of interactive systems that can perceive visual stimuli, language inputs, and other environmentally-grounded data.
We envision a future where people can easily create any virtual reality or simulated scene and interact with agents embodied within the virtual environment.
arXiv Detail & Related papers (2024-01-07T19:11:18Z) - Octopus: Embodied Vision-Language Programmer from Environmental Feedback [58.04529328728999]
Embodied vision-language models (VLMs) have achieved substantial progress in multimodal perception and reasoning.
To bridge this gap, we introduce Octopus, an embodied vision-language programmer that uses executable code generation as a medium to connect planning and manipulation.
Octopus is designed to 1) proficiently comprehend an agent's visual and textual task objectives, 2) formulate intricate action sequences, and 3) generate executable code.
arXiv Detail & Related papers (2023-10-12T17:59:58Z) - Towards Building AI-CPS with NVIDIA Isaac Sim: An Industrial Benchmark
and Case Study for Robotics Manipulation [18.392301524812645]
As a representative cyber-physical system (CPS), robotic manipulator has been widely adopted in various academic research and industrial processes.
Recent studies in robotics manipulation have started employing artificial intelligence (AI) approaches as controllers to achieve better adaptability and performance.
We propose a public industrial benchmark for robotics manipulation in this paper.
arXiv Detail & Related papers (2023-07-31T18:21:45Z) - Virtual Reality via Object Poses and Active Learning: Realizing
Telepresence Robots with Aerial Manipulation Capabilities [39.29763956979895]
This article presents a novel telepresence system for advancing aerial manipulation in dynamic and unstructured environments.
The proposed system not only features a haptic device, but also a virtual reality (VR) interface that provides real-time 3D displays of the robot's workspace.
We show over 70 robust executions of pick-and-place, force application and peg-in-hole tasks with the DLR cable-Suspended Aerial Manipulator (SAM)
arXiv Detail & Related papers (2022-10-18T08:42:30Z) - COCOI: Contact-aware Online Context Inference for Generalizable
Non-planar Pushing [87.7257446869134]
General contact-rich manipulation problems are long-standing challenges in robotics.
Deep reinforcement learning has shown great potential in solving robot manipulation tasks.
We propose COCOI, a deep RL method that encodes a context embedding of dynamics properties online.
arXiv Detail & Related papers (2020-11-23T08:20:21Z) - Validate and Enable Machine Learning in Industrial AI [47.20869253934116]
Industrial AI promises more efficient future industrial control systems.
The Petuum Optimum system is used as an example to showcase the challenges in making and testing AI models.
arXiv Detail & Related papers (2020-10-30T20:33:05Z) - SAPIEN: A SimulAted Part-based Interactive ENvironment [77.4739790629284]
SAPIEN is a realistic and physics-rich simulated environment that hosts a large-scale set for articulated objects.
We evaluate state-of-the-art vision algorithms for part detection and motion attribute recognition as well as demonstrate robotic interaction tasks.
arXiv Detail & Related papers (2020-03-19T00:11:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.