View-invariant action recognition
- URL: http://arxiv.org/abs/2009.00638v1
- Date: Tue, 1 Sep 2020 18:08:46 GMT
- Title: View-invariant action recognition
- Authors: Yogesh S Rawat, Shruti Vyas
- Abstract summary: The varying pattern of a lot-temporal appearance generated by human action is key for identifying action performed.
The research in view-invariant action recognition addresses this problem on recognizing human actions from unseen viewpoints.
- Score: 3.553493344868414
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human action recognition is an important problem in computer vision. It has a
wide range of applications in surveillance, human-computer interaction,
augmented reality, video indexing, and retrieval. The varying pattern of
spatio-temporal appearance generated by human action is key for identifying the
performed action. We have seen a lot of research exploring this dynamics of
spatio-temporal appearance for learning a visual representation of human
actions. However, most of the research in action recognition is focused on some
common viewpoints, and these approaches do not perform well when there is a
change in viewpoint. Human actions are performed in a 3-dimensional environment
and are projected to a 2-dimensional space when captured as a video from a
given viewpoint. Therefore, an action will have a different spatio-temporal
appearance from different viewpoints. The research in view-invariant action
recognition addresses this problem and focuses on recognizing human actions
from unseen viewpoints.
Related papers
- When Does Perceptual Alignment Benefit Vision Representations? [76.32336818860965]
We investigate how aligning vision model representations to human perceptual judgments impacts their usability.
We find that aligning models to perceptual judgments yields representations that improve upon the original backbones across many downstream tasks.
Our results suggest that injecting an inductive bias about human perceptual knowledge into vision models can contribute to better representations.
arXiv Detail & Related papers (2024-10-14T17:59:58Z) - Evaluating Multiview Object Consistency in Humans and Image Models [68.36073530804296]
We leverage an experimental design from the cognitive sciences which requires zero-shot visual inferences about object shape.
We collect 35K trials of behavioral data from over 500 participants.
We then evaluate the performance of common vision models.
arXiv Detail & Related papers (2024-09-09T17:59:13Z) - Computer Vision for Primate Behavior Analysis in the Wild [61.08941894580172]
Video-based behavioral monitoring has great potential for transforming how we study animal cognition and behavior.
There is still a fairly large gap between the exciting prospects and what can actually be achieved in practice today.
arXiv Detail & Related papers (2024-01-29T18:59:56Z) - Human-centric Scene Understanding for 3D Large-scale Scenarios [52.12727427303162]
We present a large-scale multi-modal dataset for human-centric scene understanding, dubbed HuCenLife.
Our HuCenLife can benefit many 3D perception tasks, such as segmentation, detection, action recognition, etc.
arXiv Detail & Related papers (2023-07-26T08:40:46Z) - The Psychophysics of Human Three-Dimensional Active Visuospatial
Problem-Solving [12.805267089186533]
Are two physical 3D objects visually the same?
Humans are remarkably good at this task without any training, with a mean accuracy of 93.82%.
No learning effect was observed on accuracy after many trials, but some effect was seen for response time, number of fixations and extent of head movement.
arXiv Detail & Related papers (2023-06-19T19:36:42Z) - Video-based Human Action Recognition using Deep Learning: A Review [4.976815699476327]
Human action recognition is an important application domain in computer vision.
Deep learning has been given particular attention by the computer vision community.
This paper presents an overview of the current state-of-the-art in action recognition using video analysis with deep learning techniques.
arXiv Detail & Related papers (2022-08-07T17:12:12Z) - Embodied vision for learning object representations [4.211128681972148]
We show that visual statistics mimicking those of a toddler improve object recognition accuracy in both familiar and novel environments.
We argue that this effect is caused by the reduction of features extracted in the background, a neural network bias for large features in the image and a greater similarity between novel and familiar background regions.
arXiv Detail & Related papers (2022-05-12T16:36:27Z) - Egocentric Activity Recognition and Localization on a 3D Map [94.30708825896727]
We address the problem of jointly recognizing and localizing actions of a mobile user on a known 3D map from egocentric videos.
Our model takes the inputs of a Hierarchical Volumetric Representation (HVR) of the environment and an egocentric video, infers the 3D action location as a latent variable, and recognizes the action based on the video and contextual cues surrounding its potential locations.
arXiv Detail & Related papers (2021-05-20T06:58:15Z) - What can human minimal videos tell us about dynamic recognition models? [14.201816626446888]
In human vision objects and their parts can be visually recognized from purely spatial or purely temporal information.
We show that human visual recognition of objects and actions can be achieved by efficiently combining spatial and motion cues.
arXiv Detail & Related papers (2021-04-19T16:53:25Z) - A Grid-based Representation for Human Action Recognition [12.043574473965318]
Human action recognition (HAR) in videos is a fundamental research topic in computer vision.
We propose a novel method for action recognition that encodes efficiently the most discriminative appearance information of an action.
Our method is tested on several benchmark datasets demonstrating that our model can accurately recognize human actions.
arXiv Detail & Related papers (2020-10-17T18:25:00Z) - What Can You Learn from Your Muscles? Learning Visual Representation
from Human Interactions [50.435861435121915]
We use human interaction and attention cues to investigate whether we can learn better representations compared to visual-only representations.
Our experiments show that our "muscly-supervised" representation outperforms a visual-only state-of-the-art method MoCo.
arXiv Detail & Related papers (2020-10-16T17:46:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.