RAG-VR: Leveraging Retrieval-Augmented Generation for 3D Question Answering in VR Environments
- URL: http://arxiv.org/abs/2504.08256v2
- Date: Mon, 14 Apr 2025 01:31:40 GMT
- Title: RAG-VR: Leveraging Retrieval-Augmented Generation for 3D Question Answering in VR Environments
- Authors: Shiyi Ding, Ying Chen,
- Abstract summary: RAG-VR is the first 3D question-answering system for VR that incorporates retrieval-augmented generation (RAG)<n>RAG-VR improves answer accuracy by 17.9%-41.8% and reduces end-to-end latency by 34.5%-47.3% compared with two baseline systems.
- Score: 3.2120448116996103
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in large language models (LLMs) provide new opportunities for context understanding in virtual reality (VR). However, VR contexts are often highly localized and personalized, limiting the effectiveness of general-purpose LLMs. To address this challenge, we present RAG-VR, the first 3D question-answering system for VR that incorporates retrieval-augmented generation (RAG), which augments an LLM with external knowledge retrieved from a localized knowledge database to improve the answer quality. RAG-VR includes a pipeline for extracting comprehensive knowledge about virtual environments and user conditions for accurate answer generation. To ensure efficient retrieval, RAG-VR offloads the retrieval process to a nearby edge server and uses only essential information during retrieval. Moreover, we train the retriever to effectively distinguish among relevant, irrelevant, and hard-to-differentiate information in relation to questions. RAG-VR improves answer accuracy by 17.9%-41.8% and reduces end-to-end latency by 34.5%-47.3% compared with two baseline systems.
Related papers
- Self-Routing RAG: Binding Selective Retrieval with Knowledge Verbalization [97.72503890388866]
We propose Self-Routing RAG (SR-RAG), a novel framework that binds selective retrieval with knowledge verbalization.<n>SR-RAG enables an LLM to dynamically decide between external retrieval and verbalizing its own parametric knowledge.<n>We introduce dynamic knowledge source inference via nearest neighbor search to improve the accuracy of knowledge source decision.
arXiv Detail & Related papers (2025-04-01T17:59:30Z) - Adaptive Score Alignment Learning for Continual Perceptual Quality Assessment of 360-Degree Videos in Virtual Reality [20.511561848185444]
We propose a novel approach for assessing the perceptual quality of VR videos, Adaptive Score Alignment Learning (ASAL)<n>ASAL integrates correlation loss with error loss to enhance alignment with human subjective ratings and precision in predicting perceptual quality.<n>We establish a comprehensive benchmark for VR-VQA and its CL counterpart, introducing new data splits and evaluation metrics.
arXiv Detail & Related papers (2025-02-27T00:29:04Z) - Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent [92.57125498367907]
Multimodal Retrieval Augmented Generation (mRAG) plays an important role in mitigating the "hallucination" issue inherent in multimodal large language models (MLLMs)<n>We propose the first self-adaptive planning agent for multimodal retrieval, OmniSearch.
arXiv Detail & Related papers (2024-11-05T09:27:21Z) - DeepNote: Note-Centric Deep Retrieval-Augmented Generation [72.70046559930555]
Retrieval-Augmented Generation (RAG) mitigates factual errors and hallucinations in Large Language Models (LLMs) for question-answering (QA)<n>We develop DeepNote, an adaptive RAG framework that achieves in-depth and robust exploration of knowledge sources through note-centric adaptive retrieval.
arXiv Detail & Related papers (2024-10-11T14:03:29Z) - Reverse Image Retrieval Cues Parametric Memory in Multimodal LLMs [54.054334823194615]
We consider Reverse Image Retrieval (RIR) augmented generation, a simple yet effective strategy to augment MLLMs with web-scale reverse image search results.
RIR robustly improves knowledge-intensive visual question answering (VQA) of GPT-4V by 37-43%, GPT-4 Turbo by 25-27%, and GPT-4o by 18-20% in terms of open-ended VQA evaluation metrics.
arXiv Detail & Related papers (2024-05-29T04:00:41Z) - Thelxinoƫ: Recognizing Human Emotions Using Pupillometry and Machine Learning [0.0]
This research contributes significantly to the Thelxino"e framework, aiming to enhance VR experiences by integrating multiple sensor data for realistic and emotionally resonant touch interactions.
Our findings open new avenues for developing more immersive and interactive VR environments, paving the way for future advancements in virtual touch technology.
arXiv Detail & Related papers (2024-03-27T21:14:17Z) - Retrieval-Augmented Generation for Large Language Models: A Survey [17.82361213043507]
Large Language Models (LLMs) showcase impressive capabilities but encounter challenges like hallucination.
Retrieval-Augmented Generation (RAG) has emerged as a promising solution by incorporating knowledge from external databases.
arXiv Detail & Related papers (2023-12-18T07:47:33Z) - Benchmark Dataset and Effective Inter-Frame Alignment for Real-World
Video Super-Resolution [65.20905703823965]
Video super-resolution (VSR) aiming to reconstruct a high-resolution (HR) video from its low-resolution (LR) counterpart has made tremendous progress in recent years.
It remains challenging to deploy existing VSR methods to real-world data with complex degradations.
EAVSR takes the proposed multi-layer adaptive spatial transform network (MultiAdaSTN) to refine the offsets provided by the pre-trained optical flow estimation network.
arXiv Detail & Related papers (2022-12-10T17:41:46Z) - WiserVR: Semantic Communication Enabled Wireless Virtual Reality
Delivery [12.158124978097982]
We propose a novel framework, namely WIreless SEmantic deliveRy for VR (WiserVR), for delivering consecutive 360deg video frames to VR users.
Deep learning-based multiple modules are well-devised for the transceiver in WiserVR to realize high-performance feature extraction and semantic recovery.
arXiv Detail & Related papers (2022-11-02T16:22:41Z) - A Review of Emerging Research Directions in Abstract Visual Reasoning [0.0]
We propose a taxonomy to categorise the tasks along 5 dimensions: input shapes, hidden rules, target task, cognitive function, and main challenge.
The perspective taken in this survey allows to characterise problems with respect to their shared and distinct properties, provides a unified view on the existing approaches for solving tasks.
One of them refers to the observation that in the machine learning literature different tasks are considered in isolation, which is in the stark contrast with the way the tasks are used to measure human intelligence.
arXiv Detail & Related papers (2022-02-21T14:58:02Z) - Feeling of Presence Maximization: mmWave-Enabled Virtual Reality Meets
Deep Reinforcement Learning [76.46530937296066]
This paper investigates the problem of providing ultra-reliable and energy-efficient virtual reality (VR) experiences for wireless mobile users.
To ensure reliable ultra-high-definition (UHD) video frame delivery to mobile users, a coordinated multipoint (CoMP) transmission technique and millimeter wave (mmWave) communications are exploited.
arXiv Detail & Related papers (2021-06-03T08:35:10Z) - Meta-Reinforcement Learning for Reliable Communication in THz/VLC
Wireless VR Networks [157.42035777757292]
The problem of enhancing the quality of virtual reality (VR) services is studied for an indoor terahertz (THz)/visible light communication (VLC) wireless network.
Small base stations (SBSs) transmit high-quality VR images to VR users over THz bands and light-emitting diodes (LEDs) provide accurate indoor positioning services.
To control the energy consumption of the studied THz/VLC wireless VR network, VLC access points (VAPs) must be selectively turned on.
arXiv Detail & Related papers (2021-01-29T15:57:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.