Related papers: Personality-Driven Gaze Animation with Conditional Generative Adversarial Networks

Related papers

DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos [110.98100817695307]
We introduce DreamDojo, a foundation world model that learns diverse interactions and dexterous controls from 44k hours of egocentric human videos.<n>Our work enables several important applications based on generative world models, including live teleoperation, policy evaluation, and model-based planning.
arXiv Detail & Related papers (2026-02-06T18:49:43Z)
Characterizing Personality from Eye-Tracking: The Role of Gaze and Its Absence in Interactive Search Environments [8.094997233445584]
This study aims to characterize personality traits through a multimodal time-series model.<n>We rely on raw gaze data from an eye tracker, minimizing preprocessing.<n>We trained models to predict personality traits using gaze signals.
arXiv Detail & Related papers (2026-01-13T07:24:39Z)
Seamless Interaction: Dyadic Audiovisual Motion Modeling and Large-Scale Dataset [113.25650486482762]
We introduce the Seamless Interaction dataset, a large-scale collection of over 4,000 hours of face-to-face interaction footage.<n>This dataset enables the development of AI technologies that understand dyadic embodied dynamics.<n>We develop a suite of models that utilize the dataset to generate dyadic motion gestures and facial expressions aligned with human speech.
arXiv Detail & Related papers (2025-06-27T18:09:49Z)
Gen-C: Populating Virtual Worlds with Generative Crowds [1.5293427903448022]
We introduce Gen-C, a generative model to automate the task of authoring high-level crowd behaviors. Gen-C bypasses the labor-intensive and challenging task of collecting and annotating real crowd video data. We demonstrate the effectiveness of our approach in two scenarios, a University Campus and a Train Station.
arXiv Detail & Related papers (2025-04-02T17:33:53Z)
GEM: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained Ego-Motion, Object Dynamics, and Scene Composition Control [122.65089441381741]
We present GEM, a Generalizable Ego-vision Multimodal world model. It predicts future frames using a reference frame, sparse features, human poses, and ego-trajectories. Our dataset is comprised of 4000+ hours of multimodal data across domains like autonomous driving, egocentric human activities, and drone flights.
arXiv Detail & Related papers (2024-12-15T14:21:19Z)
PersonalityScanner: Exploring the Validity of Personality Assessment Based on Multimodal Signals in Virtual Reality [44.15145632980038]
PersonalityScanner is a VR simulator to stimulate cognitive processes and simulate daily behaviors. We collect a synchronous multi-modal dataset with ten modalities, including first/third-person video, audio, text, eye tracking, facial microexpression, pose, depth data, log, and inertial measurement unit.
arXiv Detail & Related papers (2024-07-29T06:17:41Z)
Scaling Up Dynamic Human-Scene Interaction Modeling [58.032368564071895]
TRUMANS is the most comprehensive motion-captured HSI dataset currently available. It intricately captures whole-body human motions and part-level object dynamics. We devise a diffusion-based autoregressive model that efficiently generates HSI sequences of any length.
arXiv Detail & Related papers (2024-03-13T15:45:04Z)
Masked World Models for Visual Control [90.13638482124567]
We introduce a visual model-based RL framework that decouples visual representation learning and dynamics learning. We demonstrate that our approach achieves state-of-the-art performance on a variety of visual robotic tasks.
arXiv Detail & Related papers (2022-06-28T18:42:27Z)
Learn to Predict How Humans Manipulate Large-sized Objects from Interactive Motions [82.90906153293585]
We propose a graph neural network, HO-GCN, to fuse motion data and dynamic descriptors for the prediction task. We show the proposed network that consumes dynamic descriptors can achieve state-of-the-art prediction results and help the network better generalize to unseen objects.
arXiv Detail & Related papers (2022-06-25T09:55:39Z)
MIDAS: Deep learning human action intention prediction from natural eye movement patterns [6.557082555839739]
We present an entirely data-driven approach to decode human intention for object manipulation tasks based solely on natural gaze cues. Our results show that we can decode human intention of motion purely from natural gaze cues and object relative position, with $91.9%$ accuracy.
arXiv Detail & Related papers (2022-01-22T21:52:42Z)
Facial Emotion Recognition using Deep Residual Networks in Real-World Environments [5.834678345946704]
We propose a facial feature extractor model trained on an in-the-wild and massively collected video dataset. The dataset consists of a million labelled frames and 2,616 thousand subjects. As temporal information is important to the emotion recognition domain, we utilise LSTM cells to capture the temporal dynamics in the data.
arXiv Detail & Related papers (2021-11-04T10:08:22Z)
MAAD: A Model and Dataset for "Attended Awareness" in Driving [10.463152664328025]
We propose a model to estimate a person's attended awareness of their environment. Our model takes as input scene information in the form of a video and noisy gaze estimates. We capture a new dataset with a high-precision gaze tracker including 24.5 hours of gaze sequences from 23 subjects attending to videos of driving scenes.
arXiv Detail & Related papers (2021-10-16T16:36:10Z)
TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild [77.59069361196404]
TRiPOD is a novel method for predicting body dynamics based on graph attentional networks. To incorporate a real-world challenge, we learn an indicator representing whether an estimated body joint is visible/invisible at each frame. Our evaluation shows that TRiPOD outperforms all prior work and state-of-the-art specifically designed for each of the trajectory and pose forecasting tasks.
arXiv Detail & Related papers (2021-04-08T20:01:00Z)
Visual Distant Supervision for Scene Graph Generation [66.10579690929623]
Scene graph models usually require supervised learning on large quantities of labeled data with intensive human annotation. We propose visual distant supervision, a novel paradigm of visual relation learning, which can train scene graph models without any human-labeled data. Comprehensive experimental results show that our distantly supervised model outperforms strong weakly supervised and semi-supervised baselines.
arXiv Detail & Related papers (2021-03-29T06:35:24Z)
iGibson, a Simulation Environment for Interactive Tasks in Large Realistic Scenes [54.04456391489063]
iGibson is a novel simulation environment to develop robotic solutions for interactive tasks in large-scale realistic scenes. Our environment contains fifteen fully interactive home-sized scenes populated with rigid and articulated objects. iGibson features enable the generalization of navigation agents, and that the human-iGibson interface and integrated motion planners facilitate efficient imitation learning of simple human demonstrated behaviors.
arXiv Detail & Related papers (2020-12-05T02:14:17Z)
Hindsight for Foresight: Unsupervised Structured Dynamics Models from Physical Interaction [24.72947291987545]
Key challenge for an agent learning to interact with the world is to reason about physical properties of objects. We propose a novel approach for modeling the dynamics of a robot's interactions directly from unlabeled 3D point clouds and images.
arXiv Detail & Related papers (2020-08-02T11:04:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.