ViRAC: A Vision-Reasoning Agent Head Movement Control Framework in Arbitrary Virtual Environments
- URL: http://arxiv.org/abs/2502.10046v1
- Date: Fri, 14 Feb 2025 09:46:43 GMT
- Title: ViRAC: A Vision-Reasoning Agent Head Movement Control Framework in Arbitrary Virtual Environments
- Authors: Juyeong Hwang, Seong-Eun Hong, Hyeongyeop Kang,
- Abstract summary: We propose ViRAC, which exploits the common-sense knowledge and reasoning capabilities of large-scale models.
ViRAC produces more natural and context-aware head rotations than recent state-of-the-art techniques.
- Score: 0.13654846342364302
- License:
- Abstract: Creating lifelike virtual agents capable of interacting with their environments is a longstanding goal in computer graphics. This paper addresses the challenge of generating natural head rotations, a critical aspect of believable agent behavior for visual information gathering and dynamic responses to environmental cues. Although earlier methods have made significant strides, many rely on data-driven or saliency-based approaches, which often underperform in diverse settings and fail to capture deeper cognitive factors such as risk assessment, information seeking, and contextual prioritization. Consequently, generated behaviors can appear rigid or overlook critical scene elements, thereby diminishing the sense of realism. In this paper, we propose \textbf{ViRAC}, a \textbf{Vi}sion-\textbf{R}easoning \textbf{A}gent Head Movement \textbf{C}ontrol framework, which exploits the common-sense knowledge and reasoning capabilities of large-scale models, including Vision-Language Models (VLMs) and Large-Language Models (LLMs). Rather than explicitly modeling every cognitive mechanism, ViRAC leverages the biases and patterns internalized by these models from extensive training, thus emulating human-like perceptual processes without hand-tuned heuristics. Experimental results in multiple scenarios reveal that ViRAC produces more natural and context-aware head rotations than recent state-of-the-art techniques. Quantitative evaluations show a closer alignment with real human head-movement data, while user studies confirm improved realism and cognitive plausibility.
Related papers
- GAPartManip: A Large-scale Part-centric Dataset for Material-Agnostic Articulated Object Manipulation [9.593020996636932]
We introduce a large-scale part-centric dataset for articulated object manipulation.
We integrate it with several state-of-the-art methods for depth estimation and interaction pose prediction.
Our experiments demonstrate that our dataset significantly improves the performance of depth perception and actionable interaction pose prediction.
arXiv Detail & Related papers (2024-11-27T12:11:23Z) - Towards Context-Aware Emotion Recognition Debiasing from a Causal Demystification Perspective via De-confounded Training [14.450673163785094]
Context-Aware Emotion Recognition (CAER) provides valuable semantic cues for recognizing the emotions of target persons.
Current approaches invariably focus on designing sophisticated structures to extract perceptually critical representations from contexts.
We present a Contextual Causal Intervention Module (CCIM) to de-confound the confounder.
arXiv Detail & Related papers (2024-07-06T05:29:02Z) - Intrinsic Dynamics-Driven Generalizable Scene Representations for Vision-Oriented Decision-Making Applications [0.21051221444478305]
How to improve the ability of scene representation is a key issue in vision-oriented decision-making applications.
We propose an intrinsic dynamics-driven representation learning method with sequence models in visual reinforcement learning.
arXiv Detail & Related papers (2024-05-30T06:31:03Z) - HAZARD Challenge: Embodied Decision Making in Dynamically Changing
Environments [93.94020724735199]
HAZARD consists of three unexpected disaster scenarios, including fire, flood, and wind.
This benchmark enables us to evaluate autonomous agents' decision-making capabilities across various pipelines.
arXiv Detail & Related papers (2024-01-23T18:59:43Z) - Agent AI: Surveying the Horizons of Multimodal Interaction [83.18367129924997]
"Agent AI" is a class of interactive systems that can perceive visual stimuli, language inputs, and other environmentally-grounded data.
We envision a future where people can easily create any virtual reality or simulated scene and interact with agents embodied within the virtual environment.
arXiv Detail & Related papers (2024-01-07T19:11:18Z) - ArK: Augmented Reality with Knowledge Interactive Emergent Ability [115.72679420999535]
We develop an infinite agent that learns to transfer knowledge memory from general foundation models to novel domains.
The heart of our approach is an emerging mechanism, dubbed Augmented Reality with Knowledge Inference Interaction (ArK)
We show that our ArK approach, combined with large foundation models, significantly improves the quality of generated 2D/3D scenes.
arXiv Detail & Related papers (2023-05-01T17:57:01Z) - Narrator: Towards Natural Control of Human-Scene Interaction Generation
via Relationship Reasoning [34.00107506891627]
We focus on naturally and controllably generating realistic and diverse HSIs from textual descriptions.
We propose Narrator, a novel relationship reasoning-based generative approach.
Our experiments and perceptual studies show that Narrator can controllably generate diverse interactions and significantly outperform existing works.
arXiv Detail & Related papers (2023-03-16T15:44:15Z) - Predictive Experience Replay for Continual Visual Control and
Forecasting [62.06183102362871]
We present a new continual learning approach for visual dynamics modeling and explore its efficacy in visual control and forecasting.
We first propose the mixture world model that learns task-specific dynamics priors with a mixture of Gaussians, and then introduce a new training strategy to overcome catastrophic forgetting.
Our model remarkably outperforms the naive combinations of existing continual learning and visual RL algorithms on DeepMind Control and Meta-World benchmarks with continual visual control tasks.
arXiv Detail & Related papers (2023-03-12T05:08:03Z) - Stochastic Coherence Over Attention Trajectory For Continuous Learning
In Video Streams [64.82800502603138]
This paper proposes a novel neural-network-based approach to progressively and autonomously develop pixel-wise representations in a video stream.
The proposed method is based on a human-like attention mechanism that allows the agent to learn by observing what is moving in the attended locations.
Our experiments leverage 3D virtual environments and they show that the proposed agents can learn to distinguish objects just by observing the video stream.
arXiv Detail & Related papers (2022-04-26T09:52:31Z) - Causal Navigation by Continuous-time Neural Networks [108.84958284162857]
We propose a theoretical and experimental framework for learning causal representations using continuous-time neural networks.
We evaluate our method in the context of visual-control learning of drones over a series of complex tasks.
arXiv Detail & Related papers (2021-06-15T17:45:32Z) - On the Sensory Commutativity of Action Sequences for Embodied Agents [2.320417845168326]
We study perception for embodied agents under the mathematical formalism of group theory.
We introduce the Sensory Commutativity Probability criterion which measures how much an agent's degree of freedom affects the environment.
We empirically illustrate how SCP and the commutative properties of action sequences can be used to learn about objects in the environment.
arXiv Detail & Related papers (2020-02-13T16:58:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.