Related papers: VR-GPT: Visual Language Model for Intelligent Virtual Reality Applications

VR-GPT: Visual Language Model for Intelligent Virtual Reality Applications

URL: http://arxiv.org/abs/2405.11537v3
Date: Sat, 3 Aug 2024 10:19:54 GMT
Title: VR-GPT: Visual Language Model for Intelligent Virtual Reality Applications
Authors: Mikhail Konenkov, Artem Lykov, Daria Trinitatova, Dzmitry Tsetserukou,
Abstract summary: This study introduces a pioneering approach utilizing Visual Language Models within VR environments to enhance user interaction and task efficiency. Our system facilitates real-time, intuitive user interactions through natural language processing, without relying on visual text instructions.
Score: 2.5022287664959446
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The advent of immersive Virtual Reality applications has transformed various domains, yet their integration with advanced artificial intelligence technologies like Visual Language Models remains underexplored. This study introduces a pioneering approach utilizing VLMs within VR environments to enhance user interaction and task efficiency. Leveraging the Unity engine and a custom-developed VLM, our system facilitates real-time, intuitive user interactions through natural language processing, without relying on visual text instructions. The incorporation of speech-to-text and text-to-speech technologies allows for seamless communication between the user and the VLM, enabling the system to guide users through complex tasks effectively. Preliminary experimental results indicate that utilizing VLMs not only reduces task completion times but also improves user comfort and task engagement compared to traditional VR interaction methods.

Related papers

Exploring Context-aware and LLM-driven Locomotion for Immersive Virtual Reality [8.469329222500726]
We propose a novel locomotion technique powered by large language models (LLMs) We evaluate three locomotion methods: controller-based teleportation, voice-based steering, and our language model-driven approach. Our findings indicate that the LLM-driven locomotion possesses comparable usability, presence, and cybersickness scores to established methods.
arXiv Detail & Related papers (2025-04-24T07:48:09Z)
DWIM: Towards Tool-aware Visual Reasoning via Discrepancy-aware Workflow Generation & Instruct-Masking Tuning [57.285435980459205]
compositional visual reasoning approaches have shown promise as more effective strategies than end-to-end VR methods. We propose DWIM: Discrepancy-aware training generation, which assesses tool usage and extracts more viable for training. Instruct-Masking fine-tuning, which guides the model to only clone effective actions, enabling the generation of more practical solutions.
arXiv Detail & Related papers (2025-03-25T01:57:59Z)
Integrating Personality into Digital Humans: A Review of LLM-Driven Approaches for Virtual Reality [37.69303106863453]
The integration of large language models (LLMs) into virtual reality (VR) environments has opened new pathways for creating more immersive and interactive digital humans. This paper provides a comprehensive review of methods for enabling digital humans to adopt nuanced personality traits, exploring approaches such as zero-shot, few-shot, and fine-tuning. It highlights the challenges of integrating LLM-driven personality traits into VR, including computational demands, latency issues, and the lack of standardized evaluation frameworks for multimodal interactions.
arXiv Detail & Related papers (2025-02-22T01:33:05Z)
Enhancing Smart Environments with Context-Aware Chatbots using Large Language Models [1.6672326114795073]
This work presents a novel architecture for context-aware interactions within smart environments, leveraging Large Language Models (LLMs) to enhance user experiences. Our system integrates user location data obtained through UWB tags and sensor-equipped smart homes with real-time human activity recognition (HAR) to provide a comprehensive understanding of user context. The results highlight the significant benefits of integrating LLM with real-time activity and location data to deliver personalised and contextually relevant user experiences.
arXiv Detail & Related papers (2025-02-20T11:46:51Z)
Can You Move These Over There? An LLM-based VR Mover for Supporting Object Manipulation [12.569646616546235]
We propose VR Mover, an LLM-empowered solution that can understand and interpret the user's vocal instruction to support object manipulation. Our user study demonstrates that VR Mover enhances user usability, overall experience and performance on multi-object manipulation.
arXiv Detail & Related papers (2025-02-04T10:27:40Z)
Large Language Model-assisted Speech and Pointing Benefits Multiple 3D Object Selection in Virtual Reality [20.669785157017486]
We explore the possibility of leveraging large language models to assist multi-object selection tasks in virtual reality via a multimodal speech and raycast interaction technique. Results indicate that the introduced technique, AssistVR, outperforms the baseline technique when there are multiple target objects.
arXiv Detail & Related papers (2024-10-28T14:56:51Z)
Tremor Reduction for Accessible Ray Based Interaction in VR Applications [0.0]
Many traditional 2D interface interaction methods have been directly converted to work in a VR space with little alteration to the input mechanism. In this paper we propose the use of a low pass filter, to normalize user input noise, alleviating fine motor requirements during ray-based interaction.
arXiv Detail & Related papers (2024-05-12T17:07:16Z)
VR-GS: A Physical Dynamics-Aware Interactive Gaussian Splatting System in Virtual Reality [39.53150683721031]
Our proposed VR-GS system represents a leap forward in human-centered 3D content interaction. The components of our Virtual Reality system are designed for high efficiency and effectiveness.
arXiv Detail & Related papers (2024-01-30T01:28:36Z)
Voila-A: Aligning Vision-Language Models with User's Gaze Attention [56.755993500556734]
We introduce gaze information as a proxy for human attention to guide Vision-Language Models (VLMs) We propose a novel approach, Voila-A, for gaze alignment to enhance the interpretability and effectiveness of these models in real-world applications.
arXiv Detail & Related papers (2023-12-22T17:34:01Z)
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing [99.80742991922992]
The system can have multi-turn dialogues with human users by taking multimodal user inputs and generating multimodal responses. LLaVA-Interactive goes beyond language prompt, where visual prompt is enabled to align human intents in the interaction.
arXiv Detail & Related papers (2023-11-01T15:13:43Z)
Voice2Action: Language Models as Agent for Efficient Real-Time Interaction in Virtual Reality [1.160324357508053]
Large Language Models (LLMs) are trained to follow natural language instructions with only a handful of examples. We propose Voice2Action, a framework that hierarchically analyzes customized voice signals and textual commands through action and entity extraction. Experiment results in an urban engineering VR environment with synthetic instruction data show that Voice2Action can perform more efficiently and accurately than approaches without optimizations.
arXiv Detail & Related papers (2023-09-29T19:06:52Z)
Systematic Adaptation of Communication-focused Machine Learning Models from Real to Virtual Environments for Human-Robot Collaboration [1.392250707100996]
This paper presents a systematic framework for the real to virtual adaptation using limited size of virtual dataset. Hand gestures recognition which has been a topic of much research and subsequent commercialization in the real world has been possible because of the creation of large, labelled datasets.
arXiv Detail & Related papers (2023-07-21T03:24:55Z)
Towards Ubiquitous Semantic Metaverse: Challenges, Approaches, and Opportunities [68.03971716740823]
In recent years, ubiquitous semantic Metaverse has been studied to revolutionize immersive cyber-virtual experiences for augmented reality (AR) and virtual reality (VR) users. This survey focuses on the representation and intelligence for the four fundamental system components in ubiquitous Metaverse.
arXiv Detail & Related papers (2023-07-13T11:14:46Z)
Force-Aware Interface via Electromyography for Natural VR/AR Interaction [69.1332992637271]
We design a learning-based neural interface for natural and intuitive force inputs in VR/AR. We show that our interface can decode finger-wise forces in real-time with 3.3% mean error, and generalize to new users with little calibration. We envision our findings to push forward research towards more realistic physicality in future VR/AR.
arXiv Detail & Related papers (2022-10-03T20:51:25Z)
The Gesture Authoring Space: Authoring Customised Hand Gestures for Grasping Virtual Objects in Immersive Virtual Environments [81.5101473684021]
This work proposes a hand gesture authoring tool for object specific grab gestures allowing virtual objects to be grabbed as in the real world. The presented solution uses template matching for gesture recognition and requires no technical knowledge to design and create custom tailored hand gestures. The study showed that gestures created with the proposed approach are perceived by users as a more natural input modality than the others.
arXiv Detail & Related papers (2022-07-03T18:33:33Z)
VIRT: Improving Representation-based Models for Text Matching through Virtual Interaction [50.986371459817256]
We propose a novel textitVirtual InteRacTion mechanism, termed as VIRT, to enable full and deep interaction modeling in representation-based models. VIRT asks representation-based encoders to conduct virtual interactions to mimic the behaviors as interaction-based models do.
arXiv Detail & Related papers (2021-12-08T09:49:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.