A Variational Graph Autoencoder for Manipulation Action Recognition and
Prediction
- URL: http://arxiv.org/abs/2110.13280v1
- Date: Mon, 25 Oct 2021 21:40:42 GMT
- Title: A Variational Graph Autoencoder for Manipulation Action Recognition and
Prediction
- Authors: Gamze Akyol, Sanem Sariel, Eren Erdal Aksoy
- Abstract summary: We introduce a deep graph autoencoder to jointly learn recognition and prediction of manipulation tasks from symbolic scene graphs.
Our network has a variational autoencoder structure with two branches: one for identifying the input graph type and one for predicting the future graphs.
We benchmark our new model against different state-of-the-art methods on two different datasets, MANIAC and MSRC-9, and show that our proposed model can achieve better performance.
- Score: 1.1816942730023883
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite decades of research, understanding human manipulation activities is,
and has always been, one of the most attractive and challenging research topics
in computer vision and robotics. Recognition and prediction of observed human
manipulation actions have their roots in the applications related to, for
instance, human-robot interaction and robot learning from demonstration. The
current research trend heavily relies on advanced convolutional neural networks
to process the structured Euclidean data, such as RGB camera images. These
networks, however, come with immense computational complexity to be able to
process high dimensional raw data.
Different from the related works, we here introduce a deep graph autoencoder
to jointly learn recognition and prediction of manipulation tasks from symbolic
scene graphs, instead of relying on the structured Euclidean data. Our network
has a variational autoencoder structure with two branches: one for identifying
the input graph type and one for predicting the future graphs. The input of the
proposed network is a set of semantic graphs which store the spatial relations
between subjects and objects in the scene. The network output is a label set
representing the detected and predicted class types. We benchmark our new model
against different state-of-the-art methods on two different datasets, MANIAC
and MSRC-9, and show that our proposed model can achieve better performance. We
also release our source code https://github.com/gamzeakyol/GNet.
Related papers
- Understanding Spatio-Temporal Relations in Human-Object Interaction using Pyramid Graph Convolutional Network [2.223052975765005]
We propose a novel Pyramid Graph Convolutional Network (PGCN) to automatically recognize human-object interaction.
The system represents the 2D or 3D spatial relation of human and objects from the detection results in video data as a graph.
We evaluate our model on two challenging datasets in the field of human-object interaction recognition.
arXiv Detail & Related papers (2024-10-10T13:39:17Z) - Learning Manipulation by Predicting Interaction [85.57297574510507]
We propose a general pre-training pipeline that learns Manipulation by Predicting the Interaction.
The experimental results demonstrate that MPI exhibits remarkable improvement by 10% to 64% compared with previous state-of-the-art in real-world robot platforms.
arXiv Detail & Related papers (2024-06-01T13:28:31Z) - Dynamic Graph Message Passing Networks for Visual Recognition [112.49513303433606]
Modelling long-range dependencies is critical for scene understanding tasks in computer vision.
A fully-connected graph is beneficial for such modelling, but its computational overhead is prohibitive.
We propose a dynamic graph message passing network, that significantly reduces the computational complexity.
arXiv Detail & Related papers (2022-09-20T14:41:37Z) - Automatic Relation-aware Graph Network Proliferation [182.30735195376792]
We propose Automatic Relation-aware Graph Network Proliferation (ARGNP) for efficiently searching GNNs.
These operations can extract hierarchical node/relational information and provide anisotropic guidance for message passing on a graph.
Experiments on six datasets for four graph learning tasks demonstrate that GNNs produced by our method are superior to the current state-of-the-art hand-crafted and search-based GNNs.
arXiv Detail & Related papers (2022-05-31T10:38:04Z) - A Novel Hand Gesture Detection and Recognition system based on
ensemble-based Convolutional Neural Network [3.5665681694253903]
Detection of hand portion has become a challenging task in computer vision and pattern recognition communities.
Deep learning algorithm like convolutional neural network (CNN) architecture has become a very popular choice for classification tasks.
In this paper, an ensemble of CNN-based approaches is presented to overcome some problems like high variance during prediction, overfitting problem and also prediction errors.
arXiv Detail & Related papers (2022-02-25T06:46:58Z) - MetaGraspNet: A Large-Scale Benchmark Dataset for Vision-driven Robotic
Grasping via Physics-based Metaverse Synthesis [78.26022688167133]
We present a large-scale benchmark dataset for vision-driven robotic grasping via physics-based metaverse synthesis.
The proposed dataset contains 100,000 images and 25 different object types.
We also propose a new layout-weighted performance metric alongside the dataset for evaluating object detection and segmentation performance.
arXiv Detail & Related papers (2021-12-29T17:23:24Z) - Temporal Graph Network Embedding with Causal Anonymous Walks
Representations [54.05212871508062]
We propose a novel approach for dynamic network representation learning based on Temporal Graph Network.
For evaluation, we provide a benchmark pipeline for the evaluation of temporal network embeddings.
We show the applicability and superior performance of our model in the real-world downstream graph machine learning task provided by one of the top European banks.
arXiv Detail & Related papers (2021-08-19T15:39:52Z) - Variational models for signal processing with Graph Neural Networks [3.5939555573102853]
This paper is devoted to signal processing on point-clouds by means of neural networks.
In this work, we investigate the use of variational models for such Graph Neural Networks to process signals on graphs for unsupervised learning.
arXiv Detail & Related papers (2021-03-30T13:31:11Z) - TactileSGNet: A Spiking Graph Neural Network for Event-based Tactile
Object Recognition [17.37142241982902]
New advances in flexible, event-driven, electronic skins may soon endow robots with touch perception capabilities similar to humans.
These unique features may render current deep learning approaches such as convolutional feature extractors unsuitable for tactile learning.
We propose a novel spiking graph neural network for event-based tactile object recognition.
arXiv Detail & Related papers (2020-08-01T03:35:15Z) - GCC: Graph Contrastive Coding for Graph Neural Network Pre-Training [62.73470368851127]
Graph representation learning has emerged as a powerful technique for addressing real-world problems.
We design Graph Contrastive Coding -- a self-supervised graph neural network pre-training framework.
We conduct experiments on three graph learning tasks and ten graph datasets.
arXiv Detail & Related papers (2020-06-17T16:18:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.