Domain and View-point Agnostic Hand Action Recognition
- URL: http://arxiv.org/abs/2103.02303v1
- Date: Wed, 3 Mar 2021 10:32:36 GMT
- Title: Domain and View-point Agnostic Hand Action Recognition
- Authors: Alberto Sabater, I\~nigo Alonso, Luis Montesano, Ana C. Murillo
- Abstract summary: We introduce a novel skeleton-based hand motion representation model that tackles this problem.
We demonstrate the performance of our proposed motion representation model both working for a single specific domain (intra-domain action classification) and working for different unseen domains (cross-domain action classification)
Our approach achieves comparable results to the state-of-the-art methods that are trained intra-domain.
- Score: 6.432798111887824
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Hand action recognition is a special case of human action recognition with
applications in human robot interaction, virtual reality or life-logging
systems. Building action classifiers that are useful to recognize such
heterogeneous set of activities is very challenging. There are very subtle
changes across different actions from a given application but also large
variations across domains (e.g. virtual reality vs life-logging). This work
introduces a novel skeleton-based hand motion representation model that tackles
this problem. The framework we propose is agnostic to the application domain or
camera recording view-point. We demonstrate the performance of our proposed
motion representation model both working for a single specific domain
(intra-domain action classification) and working for different unseen domains
(cross-domain action classification). For the intra-domain case, our approach
gets better or similar performance than current state-of-the-art methods on
well-known hand action recognition benchmarks. And when performing cross-domain
hand action recognition (i.e., training our motion representation model in
frontal-view recordings and testing it both for egocentric and third-person
views), our approach achieves comparable results to the state-of-the-art
methods that are trained intra-domain.
Related papers
- Multi-view Action Recognition via Directed Gromov-Wasserstein Discrepancy [12.257725479880458]
Action recognition has become one of the popular research topics in computer vision.
We propose a multi-view attention consistency method that computes the similarity between two attentions from two different views of the action videos.
Our approach applies the idea of Neural Radiance Field to implicitly render the features from novel views when training on single-view datasets.
arXiv Detail & Related papers (2024-05-02T14:43:21Z) - Adversarial Domain Adaptation for Action Recognition Around the Clock [0.7614628596146599]
This paper presents a domain adaptation-based action recognition approach.
It uses adversarial learning in cross-domain settings to learn cross-domain action recognition.
It achieves SOTA performance on InFAR and XD145 actions datasets.
arXiv Detail & Related papers (2022-10-25T01:08:27Z) - Audio-Adaptive Activity Recognition Across Video Domains [112.46638682143065]
We leverage activity sounds for domain adaptation as they have less variance across domains and can reliably indicate which activities are not happening.
We propose an audio-adaptive encoder and associated learning methods that discriminatively adjust the visual feature representation.
We also introduce the new task of actor shift, with a corresponding audio-visual dataset, to challenge our method with situations where the activity appearance changes dramatically.
arXiv Detail & Related papers (2022-03-27T08:15:20Z) - Skeleton-Based Mutually Assisted Interacted Object Localization and
Human Action Recognition [111.87412719773889]
We propose a joint learning framework for "interacted object localization" and "human action recognition" based on skeleton data.
Our method achieves the best or competitive performance with the state-of-the-art methods for human action recognition.
arXiv Detail & Related papers (2021-10-28T10:09:34Z) - Learning Cross-modal Contrastive Features for Video Domain Adaptation [138.75196499580804]
We propose a unified framework for video domain adaptation, which simultaneously regularizes cross-modal and cross-domain feature representations.
Specifically, we treat each modality in a domain as a view and leverage the contrastive learning technique with properly designed sampling strategies.
arXiv Detail & Related papers (2021-08-26T18:14:18Z) - Exploring Temporal Context and Human Movement Dynamics for Online Action
Detection in Videos [32.88517041655816]
Temporal context and human movement dynamics can be effectively employed for online action detection.
Our approach uses various state-of-the-art architectures and appropriately combines the extracted features in order to improve action detection.
arXiv Detail & Related papers (2021-06-26T08:34:19Z) - JOKR: Joint Keypoint Representation for Unsupervised Cross-Domain Motion
Retargeting [53.28477676794658]
unsupervised motion in videos has seen substantial advancements through the use of deep neural networks.
We introduce JOKR - a JOint Keypoint Representation that handles both the source and target videos, without requiring any object prior or data collection.
We evaluate our method both qualitatively and quantitatively, and demonstrate that our method handles various cross-domain scenarios, such as different animals, different flowers, and humans.
arXiv Detail & Related papers (2021-06-17T17:32:32Z) - Hierarchical Modeling for Out-of-Scope Domain and Intent Classification [55.23920796595698]
This paper focuses on out-of-scope intent classification in dialog systems.
We propose a hierarchical multi-task learning approach based on a joint model to classify domain and intent simultaneously.
Experiments show that the model outperforms existing methods in terms of accuracy, out-of-scope recall and F1.
arXiv Detail & Related papers (2021-04-30T06:38:23Z) - Domain Adaptive Robotic Gesture Recognition with Unsupervised
Kinematic-Visual Data Alignment [60.31418655784291]
We propose a novel unsupervised domain adaptation framework which can simultaneously transfer multi-modality knowledge, i.e., both kinematic and visual data, from simulator to real robot.
It remedies the domain gap with enhanced transferable features by using temporal cues in videos, and inherent correlations in multi-modal towards recognizing gesture.
Results show that our approach recovers the performance with great improvement gains, up to 12.91% in ACC and 20.16% in F1score without using any annotations in real robot.
arXiv Detail & Related papers (2021-03-06T09:10:03Z) - Attention-Oriented Action Recognition for Real-Time Human-Robot
Interaction [11.285529781751984]
We propose an attention-oriented multi-level network framework to meet the need for real-time interaction.
Specifically, a Pre-Attention network is employed to roughly focus on the interactor in the scene at low resolution.
The other compact CNN receives the extracted skeleton sequence as input for action recognition.
arXiv Detail & Related papers (2020-07-02T12:41:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.