VR-GPT: Visual Language Model for Intelligent Virtual Reality Applications
- URL: http://arxiv.org/abs/2405.11537v3
- Date: Sat, 3 Aug 2024 10:19:54 GMT
- Title: VR-GPT: Visual Language Model for Intelligent Virtual Reality Applications
- Authors: Mikhail Konenkov, Artem Lykov, Daria Trinitatova, Dzmitry Tsetserukou,
- Abstract summary: This study introduces a pioneering approach utilizing Visual Language Models within VR environments to enhance user interaction and task efficiency.
Our system facilitates real-time, intuitive user interactions through natural language processing, without relying on visual text instructions.
- Score: 2.5022287664959446
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The advent of immersive Virtual Reality applications has transformed various domains, yet their integration with advanced artificial intelligence technologies like Visual Language Models remains underexplored. This study introduces a pioneering approach utilizing VLMs within VR environments to enhance user interaction and task efficiency. Leveraging the Unity engine and a custom-developed VLM, our system facilitates real-time, intuitive user interactions through natural language processing, without relying on visual text instructions. The incorporation of speech-to-text and text-to-speech technologies allows for seamless communication between the user and the VLM, enabling the system to guide users through complex tasks effectively. Preliminary experimental results indicate that utilizing VLMs not only reduces task completion times but also improves user comfort and task engagement compared to traditional VR interaction methods.
Related papers
- Enhancing Smart Environments with Context-Aware Chatbots using Large Language Models [1.6672326114795073]
This work presents a novel architecture for context-aware interactions within smart environments, leveraging Large Language Models (LLMs) to enhance user experiences.
Our system integrates user location data obtained through UWB tags and sensor-equipped smart homes with real-time human activity recognition (HAR) to provide a comprehensive understanding of user context.
The results highlight the significant benefits of integrating LLM with real-time activity and location data to deliver personalised and contextually relevant user experiences.
arXiv Detail & Related papers (2025-02-20T11:46:51Z) - Can You Move These Over There? An LLM-based VR Mover for Supporting Object Manipulation [12.569646616546235]
We propose VR Mover, an LLM-empowered solution that can understand and interpret the user's vocal instruction to support object manipulation.
Our user study demonstrates that VR Mover enhances user usability, overall experience and performance on multi-object manipulation.
arXiv Detail & Related papers (2025-02-04T10:27:40Z) - Large Language Model-assisted Speech and Pointing Benefits Multiple 3D Object Selection in Virtual Reality [20.669785157017486]
We explore the possibility of leveraging large language models to assist multi-object selection tasks in virtual reality via a multimodal speech and raycast interaction technique.
Results indicate that the introduced technique, AssistVR, outperforms the baseline technique when there are multiple target objects.
arXiv Detail & Related papers (2024-10-28T14:56:51Z) - Voila-A: Aligning Vision-Language Models with User's Gaze Attention [56.755993500556734]
We introduce gaze information as a proxy for human attention to guide Vision-Language Models (VLMs)
We propose a novel approach, Voila-A, for gaze alignment to enhance the interpretability and effectiveness of these models in real-world applications.
arXiv Detail & Related papers (2023-12-22T17:34:01Z) - LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation,
Generation and Editing [99.80742991922992]
The system can have multi-turn dialogues with human users by taking multimodal user inputs and generating multimodal responses.
LLaVA-Interactive goes beyond language prompt, where visual prompt is enabled to align human intents in the interaction.
arXiv Detail & Related papers (2023-11-01T15:13:43Z) - Voice2Action: Language Models as Agent for Efficient Real-Time
Interaction in Virtual Reality [1.160324357508053]
Large Language Models (LLMs) are trained to follow natural language instructions with only a handful of examples.
We propose Voice2Action, a framework that hierarchically analyzes customized voice signals and textual commands through action and entity extraction.
Experiment results in an urban engineering VR environment with synthetic instruction data show that Voice2Action can perform more efficiently and accurately than approaches without optimizations.
arXiv Detail & Related papers (2023-09-29T19:06:52Z) - Systematic Adaptation of Communication-focused Machine Learning Models
from Real to Virtual Environments for Human-Robot Collaboration [1.392250707100996]
This paper presents a systematic framework for the real to virtual adaptation using limited size of virtual dataset.
Hand gestures recognition which has been a topic of much research and subsequent commercialization in the real world has been possible because of the creation of large, labelled datasets.
arXiv Detail & Related papers (2023-07-21T03:24:55Z) - Towards Ubiquitous Semantic Metaverse: Challenges, Approaches, and
Opportunities [68.03971716740823]
In recent years, ubiquitous semantic Metaverse has been studied to revolutionize immersive cyber-virtual experiences for augmented reality (AR) and virtual reality (VR) users.
This survey focuses on the representation and intelligence for the four fundamental system components in ubiquitous Metaverse.
arXiv Detail & Related papers (2023-07-13T11:14:46Z) - Force-Aware Interface via Electromyography for Natural VR/AR Interaction [69.1332992637271]
We design a learning-based neural interface for natural and intuitive force inputs in VR/AR.
We show that our interface can decode finger-wise forces in real-time with 3.3% mean error, and generalize to new users with little calibration.
We envision our findings to push forward research towards more realistic physicality in future VR/AR.
arXiv Detail & Related papers (2022-10-03T20:51:25Z) - The Gesture Authoring Space: Authoring Customised Hand Gestures for
Grasping Virtual Objects in Immersive Virtual Environments [81.5101473684021]
This work proposes a hand gesture authoring tool for object specific grab gestures allowing virtual objects to be grabbed as in the real world.
The presented solution uses template matching for gesture recognition and requires no technical knowledge to design and create custom tailored hand gestures.
The study showed that gestures created with the proposed approach are perceived by users as a more natural input modality than the others.
arXiv Detail & Related papers (2022-07-03T18:33:33Z) - VIRT: Improving Representation-based Models for Text Matching through
Virtual Interaction [50.986371459817256]
We propose a novel textitVirtual InteRacTion mechanism, termed as VIRT, to enable full and deep interaction modeling in representation-based models.
VIRT asks representation-based encoders to conduct virtual interactions to mimic the behaviors as interaction-based models do.
arXiv Detail & Related papers (2021-12-08T09:49:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.