Disentangled Sequence Clustering for Human Intention Inference
- URL: http://arxiv.org/abs/2101.09500v1
- Date: Sat, 23 Jan 2021 13:39:34 GMT
- Title: Disentangled Sequence Clustering for Human Intention Inference
- Authors: Mark Zolotas, Yiannis Demiris
- Abstract summary: Disentangled Sequence Clustering Variational Autoencoder (DiSCVAE)
Disentangled Sequence Clustering Variational Autoencoder (DiSCVAE)
- Score: 40.46123013107865
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Equipping robots with the ability to infer human intent is a vital
precondition for effective collaboration. Most computational approaches towards
this objective employ probabilistic reasoning to recover a distribution of
"intent" conditioned on the robot's perceived sensory state. However, these
approaches typically assume task-specific notions of human intent (e.g.
labelled goals) are known a priori. To overcome this constraint, we propose the
Disentangled Sequence Clustering Variational Autoencoder (DiSCVAE), a
clustering framework that can be used to learn such a distribution of intent in
an unsupervised manner. The DiSCVAE leverages recent advances in unsupervised
learning to derive a disentangled latent representation of sequential data,
separating time-varying local features from time-invariant global aspects.
Though unlike previous frameworks for disentanglement, the proposed variant
also infers a discrete variable to form a latent mixture model and enable
clustering of global sequence concepts, e.g. intentions from observed human
behaviour. To evaluate the DiSCVAE, we first validate its capacity to discover
classes from unlabelled sequences using video datasets of bouncing digits and
2D animations. We then report results from a real-world human-robot interaction
experiment conducted on a robotic wheelchair. Our findings glean insights into
how the inferred discrete variable coincides with human intent and thus serves
to improve assistance in collaborative settings, such as shared control.
Related papers
- Unified Dynamic Scanpath Predictors Outperform Individually Trained Neural Models [18.327960366321655]
We develop a deep learning-based social cue integration model for saliency prediction to predict scanpaths in videos.
We evaluate our approach on gaze of dynamic social scenes, observed under the free-viewing condition.
Results indicate that a single unified model, trained on all the observers' scanpaths, performs on par or better than individually trained models.
arXiv Detail & Related papers (2024-05-05T13:15:11Z) - Gaussian Mixture Models for Affordance Learning using Bayesian Networks [50.18477618198277]
Affordances are fundamental descriptors of relationships between actions, objects and effects.
This paper approaches the problem of an embodied agent exploring the world and learning these affordances autonomously from its sensory experiences.
arXiv Detail & Related papers (2024-02-08T22:05:45Z) - Time-series Generation by Contrastive Imitation [87.51882102248395]
We study a generative framework that seeks to combine the strengths of both: Motivated by a moment-matching objective to mitigate compounding error, we optimize a local (but forward-looking) transition policy.
At inference, the learned policy serves as the generator for iterative sampling, and the learned energy serves as a trajectory-level measure for evaluating sample quality.
arXiv Detail & Related papers (2023-11-02T16:45:25Z) - Predicting Human Mobility via Self-supervised Disentanglement Learning [21.61423193132924]
We propose a novel disentangled solution called SSDL for tackling the next POI prediction problem.
We present two realistic trajectory augmentation approaches to enhance the understanding of both the human intrinsic periodicity and constantly-changing intents.
Extensive experiments conducted on four real-world datasets demonstrate that our proposed SSDL significantly outperforms the state-of-the-art approaches.
arXiv Detail & Related papers (2022-11-17T16:17:22Z) - H-SAUR: Hypothesize, Simulate, Act, Update, and Repeat for Understanding
Object Articulations from Interactions [62.510951695174604]
"Hypothesize, Simulate, Act, Update, and Repeat" (H-SAUR) is a probabilistic generative framework that generates hypotheses about how objects articulate given input observations.
We show that the proposed model significantly outperforms the current state-of-the-art articulated object manipulation framework.
We further improve the test-time efficiency of H-SAUR by integrating a learned prior from learning-based vision models.
arXiv Detail & Related papers (2022-10-22T18:39:33Z) - SCIM: Simultaneous Clustering, Inference, and Mapping for Open-World
Semantic Scene Understanding [34.19666841489646]
We show how a robot can autonomously discover novel semantic classes and improve accuracy on known classes when exploring an unknown environment.
We develop a general framework for mapping and clustering that we then use to generate a self-supervised learning signal to update a semantic segmentation model.
In particular, we show how clustering parameters can be optimized during deployment and that fusion of multiple observation modalities improves novel object discovery compared to prior work.
arXiv Detail & Related papers (2022-06-21T18:41:51Z) - Active Uncertainty Learning for Human-Robot Interaction: An Implicit
Dual Control Approach [5.05828899601167]
We present an algorithmic approach to enable uncertainty learning for human-in-the-loop motion planning based on the implicit dual control paradigm.
Our approach relies on sampling-based approximation of dynamic programming model predictive control problem.
The resulting policy is shown to preserve the dual control effect for generic human predictive models with both continuous and categorical uncertainty.
arXiv Detail & Related papers (2022-02-15T20:40:06Z) - Hybrid Dynamic Contrast and Probability Distillation for Unsupervised
Person Re-Id [109.1730454118532]
Unsupervised person re-identification (Re-Id) has attracted increasing attention due to its practical application in the read-world video surveillance system.
We present the hybrid dynamic cluster contrast and probability distillation algorithm.
It formulates the unsupervised Re-Id problem into an unified local-to-global dynamic contrastive learning and self-supervised probability distillation framework.
arXiv Detail & Related papers (2021-09-29T02:56:45Z) - Beyond Tracking: Using Deep Learning to Discover Novel Interactions in
Biological Swarms [3.441021278275805]
We propose training deep network models to predict system-level states directly from generic graphical features from the entire view.
Because the resulting predictive models are not based on human-understood predictors, we use explanatory modules.
This represents an example of augmented intelligence in behavioral ecology -- knowledge co-creation in a human-AI team.
arXiv Detail & Related papers (2021-08-20T22:50:41Z) - TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild [77.59069361196404]
TRiPOD is a novel method for predicting body dynamics based on graph attentional networks.
To incorporate a real-world challenge, we learn an indicator representing whether an estimated body joint is visible/invisible at each frame.
Our evaluation shows that TRiPOD outperforms all prior work and state-of-the-art specifically designed for each of the trajectory and pose forecasting tasks.
arXiv Detail & Related papers (2021-04-08T20:01:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.