VR-GPT: Visual Language Model for Intelligent Virtual Reality Applications
- URL: http://arxiv.org/abs/2405.11537v3
- Date: Sat, 3 Aug 2024 10:19:54 GMT
- Title: VR-GPT: Visual Language Model for Intelligent Virtual Reality Applications
- Authors: Mikhail Konenkov, Artem Lykov, Daria Trinitatova, Dzmitry Tsetserukou,
- Abstract summary: This study introduces a pioneering approach utilizing Visual Language Models within VR environments to enhance user interaction and task efficiency.
Our system facilitates real-time, intuitive user interactions through natural language processing, without relying on visual text instructions.
- Score: 2.5022287664959446
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The advent of immersive Virtual Reality applications has transformed various domains, yet their integration with advanced artificial intelligence technologies like Visual Language Models remains underexplored. This study introduces a pioneering approach utilizing VLMs within VR environments to enhance user interaction and task efficiency. Leveraging the Unity engine and a custom-developed VLM, our system facilitates real-time, intuitive user interactions through natural language processing, without relying on visual text instructions. The incorporation of speech-to-text and text-to-speech technologies allows for seamless communication between the user and the VLM, enabling the system to guide users through complex tasks effectively. Preliminary experimental results indicate that utilizing VLMs not only reduces task completion times but also improves user comfort and task engagement compared to traditional VR interaction methods.
Related papers
- Large Language Model-assisted Speech and Pointing Benefits Multiple 3D Object Selection in Virtual Reality [20.669785157017486]
We explore the possibility of leveraging large language models to assist multi-object selection tasks in virtual reality via a multimodal speech and raycast interaction technique.
Results indicate that the introduced technique, AssistVR, outperforms the baseline technique when there are multiple target objects.
arXiv Detail & Related papers (2024-10-28T14:56:51Z) - Tremor Reduction for Accessible Ray Based Interaction in VR Applications [0.0]
Many traditional 2D interface interaction methods have been directly converted to work in a VR space with little alteration to the input mechanism.
In this paper we propose the use of a low pass filter, to normalize user input noise, alleviating fine motor requirements during ray-based interaction.
arXiv Detail & Related papers (2024-05-12T17:07:16Z) - VR-GS: A Physical Dynamics-Aware Interactive Gaussian Splatting System in Virtual Reality [39.53150683721031]
Our proposed VR-GS system represents a leap forward in human-centered 3D content interaction.
The components of our Virtual Reality system are designed for high efficiency and effectiveness.
arXiv Detail & Related papers (2024-01-30T01:28:36Z) - Voila-A: Aligning Vision-Language Models with User's Gaze Attention [56.755993500556734]
We introduce gaze information as a proxy for human attention to guide Vision-Language Models (VLMs)
We propose a novel approach, Voila-A, for gaze alignment to enhance the interpretability and effectiveness of these models in real-world applications.
arXiv Detail & Related papers (2023-12-22T17:34:01Z) - LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation,
Generation and Editing [99.80742991922992]
The system can have multi-turn dialogues with human users by taking multimodal user inputs and generating multimodal responses.
LLaVA-Interactive goes beyond language prompt, where visual prompt is enabled to align human intents in the interaction.
arXiv Detail & Related papers (2023-11-01T15:13:43Z) - Voice2Action: Language Models as Agent for Efficient Real-Time
Interaction in Virtual Reality [1.160324357508053]
Large Language Models (LLMs) are trained to follow natural language instructions with only a handful of examples.
We propose Voice2Action, a framework that hierarchically analyzes customized voice signals and textual commands through action and entity extraction.
Experiment results in an urban engineering VR environment with synthetic instruction data show that Voice2Action can perform more efficiently and accurately than approaches without optimizations.
arXiv Detail & Related papers (2023-09-29T19:06:52Z) - Systematic Adaptation of Communication-focused Machine Learning Models
from Real to Virtual Environments for Human-Robot Collaboration [1.392250707100996]
This paper presents a systematic framework for the real to virtual adaptation using limited size of virtual dataset.
Hand gestures recognition which has been a topic of much research and subsequent commercialization in the real world has been possible because of the creation of large, labelled datasets.
arXiv Detail & Related papers (2023-07-21T03:24:55Z) - Towards Ubiquitous Semantic Metaverse: Challenges, Approaches, and
Opportunities [68.03971716740823]
In recent years, ubiquitous semantic Metaverse has been studied to revolutionize immersive cyber-virtual experiences for augmented reality (AR) and virtual reality (VR) users.
This survey focuses on the representation and intelligence for the four fundamental system components in ubiquitous Metaverse.
arXiv Detail & Related papers (2023-07-13T11:14:46Z) - Force-Aware Interface via Electromyography for Natural VR/AR Interaction [69.1332992637271]
We design a learning-based neural interface for natural and intuitive force inputs in VR/AR.
We show that our interface can decode finger-wise forces in real-time with 3.3% mean error, and generalize to new users with little calibration.
We envision our findings to push forward research towards more realistic physicality in future VR/AR.
arXiv Detail & Related papers (2022-10-03T20:51:25Z) - The Gesture Authoring Space: Authoring Customised Hand Gestures for
Grasping Virtual Objects in Immersive Virtual Environments [81.5101473684021]
This work proposes a hand gesture authoring tool for object specific grab gestures allowing virtual objects to be grabbed as in the real world.
The presented solution uses template matching for gesture recognition and requires no technical knowledge to design and create custom tailored hand gestures.
The study showed that gestures created with the proposed approach are perceived by users as a more natural input modality than the others.
arXiv Detail & Related papers (2022-07-03T18:33:33Z) - VIRT: Improving Representation-based Models for Text Matching through
Virtual Interaction [50.986371459817256]
We propose a novel textitVirtual InteRacTion mechanism, termed as VIRT, to enable full and deep interaction modeling in representation-based models.
VIRT asks representation-based encoders to conduct virtual interactions to mimic the behaviors as interaction-based models do.
arXiv Detail & Related papers (2021-12-08T09:49:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.