Video-based Contrastive Learning on Decision Trees: from Action
Recognition to Autism Diagnosis
- URL: http://arxiv.org/abs/2304.10073v2
- Date: Fri, 21 Apr 2023 06:17:01 GMT
- Title: Video-based Contrastive Learning on Decision Trees: from Action
Recognition to Autism Diagnosis
- Authors: Mindi Ruan, Xiangxu Yu, Na Zhang, Chuanbo Hu, Shuo Wang, Xin Li
- Abstract summary: We present a new contrastive learning-based framework for decision tree-based classification of actions.
The key idea is to translate the original multi-class action recognition into a series of binary classification tasks on a pre-constructed decision tree.
We have demonstrated the promising performance of video-based autism spectrum disorder diagnosis on the CalTech interview video database.
- Score: 17.866016075963437
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: How can we teach a computer to recognize 10,000 different actions? Deep
learning has evolved from supervised and unsupervised to self-supervised
approaches. In this paper, we present a new contrastive learning-based
framework for decision tree-based classification of actions, including
human-human interactions (HHI) and human-object interactions (HOI). The key
idea is to translate the original multi-class action recognition into a series
of binary classification tasks on a pre-constructed decision tree. Under the
new framework of contrastive learning, we present the design of an interaction
adjacent matrix (IAM) with skeleton graphs as the backbone for modeling various
action-related attributes such as periodicity and symmetry. Through the
construction of various pretext tasks, we obtain a series of binary
classification nodes on the decision tree that can be combined to support
higher-level recognition tasks. Experimental justification for the potential of
our approach in real-world applications ranges from interaction recognition to
symmetry detection. In particular, we have demonstrated the promising
performance of video-based autism spectrum disorder (ASD) diagnosis on the
CalTech interview video database.
Related papers
- Visual-Geometric Collaborative Guidance for Affordance Learning [63.038406948791454]
We propose a visual-geometric collaborative guided affordance learning network that incorporates visual and geometric cues.
Our method outperforms the representative models regarding objective metrics and visual quality.
arXiv Detail & Related papers (2024-10-15T07:35:51Z) - An Information Compensation Framework for Zero-Shot Skeleton-based Action Recognition [49.45660055499103]
Zero-shot human skeleton-based action recognition aims to construct a model that can recognize actions outside the categories seen during training.
Previous research has focused on aligning sequences' visual and semantic spatial distributions.
We introduce a new loss function sampling method to obtain a tight and robust representation.
arXiv Detail & Related papers (2024-06-02T06:53:01Z) - Disentangled Interaction Representation for One-Stage Human-Object
Interaction Detection [70.96299509159981]
Human-Object Interaction (HOI) detection is a core task for human-centric image understanding.
Recent one-stage methods adopt a transformer decoder to collect image-wide cues that are useful for interaction prediction.
Traditional two-stage methods benefit significantly from their ability to compose interaction features in a disentangled and explainable manner.
arXiv Detail & Related papers (2023-12-04T08:02:59Z) - Joint Engagement Classification using Video Augmentation Techniques for
Multi-person Human-robot Interaction [22.73774398716566]
We present a novel framework for identifying a parent-child dyad's joint engagement.
Using a dataset of parent-child dyads reading storybooks together with a social robot at home, we first train RGB frame- and skeleton-based joint engagement recognition models.
Second, we demonstrate experimental results on the use of trained models in the robot-parent-child interaction context.
arXiv Detail & Related papers (2022-12-28T23:52:55Z) - Backprop-Free Reinforcement Learning with Active Neural Generative
Coding [84.11376568625353]
We propose a computational framework for learning action-driven generative models without backpropagation of errors (backprop) in dynamic environments.
We develop an intelligent agent that operates even with sparse rewards, drawing inspiration from the cognitive theory of planning as inference.
The robust performance of our agent offers promising evidence that a backprop-free approach for neural inference and learning can drive goal-directed behavior.
arXiv Detail & Related papers (2021-07-10T19:02:27Z) - Exoskeleton-Based Multimodal Action and Movement Recognition:
Identifying and Developing the Optimal Boosted Learning Approach [0.0]
This paper makes two scientific contributions to the field of exoskeleton-based action and movement recognition.
It presents a novel machine learning and pattern recognition-based framework that can detect a wide range of actions and movements.
arXiv Detail & Related papers (2021-06-18T19:43:54Z) - Transferable Interactiveness Knowledge for Human-Object Interaction
Detection [46.89715038756862]
We explore interactiveness knowledge which indicates whether a human and an object interact with each other or not.
We found that interactiveness knowledge can be learned across HOI datasets and bridge the gap between diverse HOI category settings.
Our core idea is to exploit an interactiveness network to learn the general interactiveness knowledge from multiple HOI datasets.
arXiv Detail & Related papers (2021-01-25T18:21:07Z) - MS$^2$L: Multi-Task Self-Supervised Learning for Skeleton Based Action
Recognition [36.74293548921099]
We integrate motion prediction, jigsaw puzzle recognition, and contrastive learning to learn skeleton features from different aspects.
Our experiments on the NW-UCLA, NTU RGB+D, and PKUMMD datasets show remarkable performance for action recognition.
arXiv Detail & Related papers (2020-10-12T11:09:44Z) - Symbiotic Adversarial Learning for Attribute-based Person Search [86.7506832053208]
We present a symbiotic adversarial learning framework, called SAL.Two GANs sit at the base of the framework in a symbiotic learning scheme.
Specifically, two different types of generative adversarial networks learn collaboratively throughout the training process.
arXiv Detail & Related papers (2020-07-19T07:24:45Z) - Automatic Gesture Recognition in Robot-assisted Surgery with
Reinforcement Learning and Tree Search [63.07088785532908]
We propose a framework based on reinforcement learning and tree search for joint surgical gesture segmentation and classification.
Our framework consistently outperforms the existing methods on the suturing task of JIGSAWS dataset in terms of accuracy, edit score and F1 score.
arXiv Detail & Related papers (2020-02-20T13:12:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.