Human-like Relational Models for Activity Recognition in Video
- URL: http://arxiv.org/abs/2107.05319v1
- Date: Mon, 12 Jul 2021 11:13:17 GMT
- Title: Human-like Relational Models for Activity Recognition in Video
- Authors: Joseph Chrol-Cannon, Andrew Gilbert, Ranko Lazic, Adithya
Madhusoodanan, Frank Guerin
- Abstract summary: Video activity recognition by deep neural networks is impressive for many classes.
Deep neural networks can struggle to learn critical relationships effectively.
We propose a more human-like approach to activity recognition, which interprets a video in sequential temporal phases.
We apply the method to a challenging subset of the something-something dataset and achieve a more robust performance against neural network baselines on challenging activities.
- Score: 8.87742125296885
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Video activity recognition by deep neural networks is impressive for many
classes. However, it falls short of human performance, especially for
challenging to discriminate activities. Humans differentiate these complex
activities by recognising critical spatio-temporal relations among explicitly
recognised objects and parts, for example, an object entering the aperture of a
container. Deep neural networks can struggle to learn such critical
relationships effectively. Therefore we propose a more human-like approach to
activity recognition, which interprets a video in sequential temporal phases
and extracts specific relationships among objects and hands in those phases.
Random forest classifiers are learnt from these extracted relationships. We
apply the method to a challenging subset of the something-something dataset and
achieve a more robust performance against neural network baselines on
challenging activities.
Related papers
- Visual-Geometric Collaborative Guidance for Affordance Learning [63.038406948791454]
We propose a visual-geometric collaborative guided affordance learning network that incorporates visual and geometric cues.
Our method outperforms the representative models regarding objective metrics and visual quality.
arXiv Detail & Related papers (2024-10-15T07:35:51Z) - A Comprehensive Review of Few-shot Action Recognition [64.47305887411275]
Few-shot action recognition aims to address the high cost and impracticality of manually labeling complex and variable video data.
It requires accurately classifying human actions in videos using only a few labeled examples per class.
arXiv Detail & Related papers (2024-07-20T03:53:32Z) - Compositional Learning in Transformer-Based Human-Object Interaction
Detection [6.630793383852106]
Long-tailed distribution of labeled instances is a primary challenge in HOI detection.
Inspired by the nature of HOI triplets, some existing approaches adopt the idea of compositional learning.
We creatively propose a transformer-based framework for compositional HOI learning.
arXiv Detail & Related papers (2023-08-11T06:41:20Z) - Distillation of Human-Object Interaction Contexts for Action Recognition [0.0]
We learn human-object relationships by exploiting the interaction of their local and global contexts.
We propose the Global-Local Interaction Distillation Network (GLIDN), learning human and object interactions through space and time.
GLIDN encodes humans and objects into graph nodes and learns local and global relations via graph attention network.
arXiv Detail & Related papers (2021-12-17T11:39:44Z) - Weakly Supervised Human-Object Interaction Detection in Video via
Contrastive Spatiotemporal Regions [81.88294320397826]
A system does not know what human-object interactions are present in a video as or the actual location of the human and object.
We introduce a dataset comprising over 6.5k videos with human-object interaction that have been curated from sentence captions.
We demonstrate improved performance over weakly supervised baselines adapted to our annotations on our video dataset.
arXiv Detail & Related papers (2021-10-07T15:30:18Z) - Efficient Modelling Across Time of Human Actions and Interactions [92.39082696657874]
We argue that current fixed-sized-temporal kernels in 3 convolutional neural networks (CNNDs) can be improved to better deal with temporal variations in the input.
We study how we can better handle between classes of actions, by enhancing their feature differences over different layers of the architecture.
The proposed approaches are evaluated on several benchmark action recognition datasets and show competitive results.
arXiv Detail & Related papers (2021-10-05T15:39:11Z) - Learning Asynchronous and Sparse Human-Object Interaction in Videos [56.73059840294019]
Asynchronous-Sparse Interaction Graph Networks (ASSIGN) is able to automatically detect the structure of interaction events associated with entities in a video scene.
ASSIGN is tested on human-object interaction recognition and shows superior performance in segmenting and labeling of human sub-activities and object affordances from raw videos.
arXiv Detail & Related papers (2021-03-03T23:43:55Z) - Continuous Emotion Recognition with Spatiotemporal Convolutional Neural
Networks [82.54695985117783]
We investigate the suitability of state-of-the-art deep learning architectures for continuous emotion recognition using long video sequences captured in-the-wild.
We have developed and evaluated convolutional recurrent neural networks combining 2D-CNNs and long short term-memory units, and inflated 3D-CNN models, which are built by inflating the weights of a pre-trained 2D-CNN model during fine-tuning.
arXiv Detail & Related papers (2020-11-18T13:42:05Z) - Towards Deep Clustering of Human Activities from Wearables [21.198881633580797]
We develop an unsupervised end-to-end learning strategy for the fundamental problem of human activity recognition from wearables.
We show the effectiveness of our approach to jointly learn unsupervised representations for sensory data and generate cluster assignments with strong semantic correspondence to distinct human activities.
arXiv Detail & Related papers (2020-08-02T13:55:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.