Using Features at Multiple Temporal and Spatial Resolutions to Predict
Human Behavior in Real Time
- URL: http://arxiv.org/abs/2211.06721v1
- Date: Sat, 12 Nov 2022 18:41:33 GMT
- Title: Using Features at Multiple Temporal and Spatial Resolutions to Predict
Human Behavior in Real Time
- Authors: Liang Zhang, Justin Lieffers, Adarsh Pyarelal
- Abstract summary: We present an approach for integrating high and low-resolution spatial and temporal information to predict human behavior in real time.
Our model composes neural networks for high and low-resolution feature extraction with a neural network for behavior prediction, with all three networks trained simultaneously.
- Score: 2.955419572714387
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: When performing complex tasks, humans naturally reason at multiple temporal
and spatial resolutions simultaneously. We contend that for an artificially
intelligent agent to effectively model human teammates, i.e., demonstrate
computational theory of mind (ToM), it should do the same. In this paper, we
present an approach for integrating high and low-resolution spatial and
temporal information to predict human behavior in real time and evaluate it on
data collected from human subjects performing simulated urban search and rescue
(USAR) missions in a Minecraft-based environment. Our model composes neural
networks for high and low-resolution feature extraction with a neural network
for behavior prediction, with all three networks trained simultaneously. The
high-resolution extractor encodes dynamically changing goals robustly by taking
as input the Manhattan distance difference between the humans' Minecraft
avatars and candidate goals in the environment for the latest few actions,
computed from a high-resolution gridworld representation. In contrast, the
low-resolution extractor encodes participants' historical behavior using a
historical state matrix computed from a low-resolution graph representation.
Through supervised learning, our model acquires a robust prior for human
behavior prediction, and can effectively deal with long-term observations. Our
experimental results demonstrate that our method significantly improves
prediction accuracy compared to approaches that only use high-resolution
information.
Related papers
- RTify: Aligning Deep Neural Networks with Human Behavioral Decisions [10.510746720313303]
Current neural network models of primate vision focus on replicating overall levels of behavioral accuracy.
We introduce a novel computational framework to model the dynamics of human behavioral choices by learning to align the temporal dynamics of a recurrent neural network to human reaction times (RTs)
We show that the approximation can be used to optimize an "ideal-observer" RNN model to achieve an optimal tradeoff between speed and accuracy without human data.
arXiv Detail & Related papers (2024-11-06T03:04:05Z) - StackFLOW: Monocular Human-Object Reconstruction by Stacked Normalizing Flow with Offset [56.71580976007712]
We propose to use the Human-Object Offset between anchors which are densely sampled from the surface of human mesh and object mesh to represent human-object spatial relation.
Based on this representation, we propose Stacked Normalizing Flow (StackFLOW) to infer the posterior distribution of human-object spatial relations from the image.
During the optimization stage, we finetune the human body pose and object 6D pose by maximizing the likelihood of samples.
arXiv Detail & Related papers (2024-07-30T04:57:21Z) - Learning Human Action Recognition Representations Without Real Humans [66.61527869763819]
We present a benchmark that leverages real-world videos with humans removed and synthetic data containing virtual humans to pre-train a model.
We then evaluate the transferability of the representation learned on this data to a diverse set of downstream action recognition benchmarks.
Our approach outperforms previous baselines by up to 5%.
arXiv Detail & Related papers (2023-11-10T18:38:14Z) - AnyPose: Anytime 3D Human Pose Forecasting via Neural Ordinary
Differential Equations [2.7195102129095003]
AnyPose is a lightweight continuous-time neural architecture that models human behavior dynamics with neural ordinary differential equations.
Our results demonstrate that AnyPose exhibits high-performance accuracy in predicting future poses and takes significantly lower computational time than traditional methods.
arXiv Detail & Related papers (2023-09-09T16:59:57Z) - Multi-Timescale Modeling of Human Behavior [0.18199355648379031]
We propose an LSTM network architecture that processes behavioral information at multiple timescales to predict future behavior.
We evaluate our architecture on data collected in an urban search and rescue scenario simulated in a virtual Minecraft-based testbed.
arXiv Detail & Related papers (2022-11-16T15:58:57Z) - Learn to Predict How Humans Manipulate Large-sized Objects from
Interactive Motions [82.90906153293585]
We propose a graph neural network, HO-GCN, to fuse motion data and dynamic descriptors for the prediction task.
We show the proposed network that consumes dynamic descriptors can achieve state-of-the-art prediction results and help the network better generalize to unseen objects.
arXiv Detail & Related papers (2022-06-25T09:55:39Z) - Investigating Pose Representations and Motion Contexts Modeling for 3D
Motion Prediction [63.62263239934777]
We conduct an indepth study on various pose representations with a focus on their effects on the motion prediction task.
We propose a novel RNN architecture termed AHMR (Attentive Hierarchical Motion Recurrent network) for motion prediction.
Our approach outperforms the state-of-the-art methods in short-term prediction and achieves much enhanced long-term prediction proficiency.
arXiv Detail & Related papers (2021-12-30T10:45:22Z) - Development of Human Motion Prediction Strategy using Inception Residual
Block [1.0705399532413613]
We propose an Inception Residual Block (IRB) to detect temporal features in human poses.
Our main contribution is to propose a residual connection between input and the output of the inception block to have a continuity between the previously observed pose and the next predicted pose.
With this proposed architecture, it learns prior knowledge much better about human poses and we achieve much higher prediction accuracy as detailed in the paper.
arXiv Detail & Related papers (2021-08-09T12:49:48Z) - STAR: Sparse Transformer-based Action Recognition [61.490243467748314]
This work proposes a novel skeleton-based human action recognition model with sparse attention on the spatial dimension and segmented linear attention on the temporal dimension of data.
Experiments show that our model can achieve comparable performance while utilizing much less trainable parameters and achieve high speed in training and inference.
arXiv Detail & Related papers (2021-07-15T02:53:11Z) - DRG: Dual Relation Graph for Human-Object Interaction Detection [65.50707710054141]
We tackle the challenging problem of human-object interaction (HOI) detection.
Existing methods either recognize the interaction of each human-object pair in isolation or perform joint inference based on complex appearance-based features.
In this paper, we leverage an abstract spatial-semantic representation to describe each human-object pair and aggregate the contextual information of the scene via a dual relation graph.
arXiv Detail & Related papers (2020-08-26T17:59:40Z) - Wavelet-based temporal models of human activity for anomaly detection in
smart robot-assisted environments [2.299866262521074]
This paper presents a new approach for temporal modelling of long-term human activities with smart-home sensors.
The model is based on wavelet transforms and used to forecast smart sensor data, providing a temporal prior to detect unexpected events in human environments.
arXiv Detail & Related papers (2020-02-26T14:08:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.