Activity Recognition with Moving Cameras and Few Training Examples:
Applications for Detection of Autism-Related Headbanging
- URL: http://arxiv.org/abs/2101.03478v1
- Date: Sun, 10 Jan 2021 05:37:05 GMT
- Title: Activity Recognition with Moving Cameras and Few Training Examples:
Applications for Detection of Autism-Related Headbanging
- Authors: Peter Washington, Aaron Kline, Onur Cezmi Mutlu, Emilie Leblanc, Cathy
Hou, Nate Stockham, Kelley Paskov, Brianna Chrisman, Dennis P. Wall
- Abstract summary: Activity recognition computer vision algorithms can be used to detect the presence of autism-related behaviors.
We document the advantages and limitations of current feature representation techniques for activity recognition when applied to head banging detection.
We create a computer vision classifier for detecting head banging in home videos using a time-distributed convolutional neural network.
- Score: 1.603589863010401
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Activity recognition computer vision algorithms can be used to detect the
presence of autism-related behaviors, including what are termed "restricted and
repetitive behaviors", or stimming, by diagnostic instruments. The limited data
that exist in this domain are usually recorded with a handheld camera which can
be shaky or even moving, posing a challenge for traditional feature
representation approaches for activity detection which mistakenly capture the
camera's motion as a feature. To address these issues, we first document the
advantages and limitations of current feature representation techniques for
activity recognition when applied to head banging detection. We then propose a
feature representation consisting exclusively of head pose keypoints. We create
a computer vision classifier for detecting head banging in home videos using a
time-distributed convolutional neural network (CNN) in which a single CNN
extracts features from each frame in the input sequence, and these extracted
features are fed as input to a long short-term memory (LSTM) network. On the
binary task of predicting head banging and no head banging within videos from
the Self Stimulatory Behaviour Dataset (SSBD), we reach a mean F1-score of
90.77% using 3-fold cross validation (with individual fold F1-scores of 83.3%,
89.0%, and 100.0%) when ensuring that no child who appeared in the train set
was in the test set for all folds. This work documents a successful technique
for training a computer vision classifier which can detect human motion with
few training examples and even when the camera recording the source clips is
unstable. The general methods described here can be applied by designers and
developers of interactive systems towards other human motion and pose
classification problems used in mobile and ubiquitous interactive systems.
Related papers
- EventTransAct: A video transformer-based framework for Event-camera
based action recognition [52.537021302246664]
Event cameras offer new opportunities compared to standard action recognition in RGB videos.
In this study, we employ a computationally efficient model, namely the video transformer network (VTN), which initially acquires spatial embeddings per event-frame.
In order to better adopt the VTN for the sparse and fine-grained nature of event data, we design Event-Contrastive Loss ($mathcalL_EC$) and event-specific augmentations.
arXiv Detail & Related papers (2023-08-25T23:51:07Z) - Agile gesture recognition for capacitive sensing devices: adapting
on-the-job [55.40855017016652]
We demonstrate a hand gesture recognition system that uses signals from capacitive sensors embedded into the etee hand controller.
The controller generates real-time signals from each of the wearer five fingers.
We use a machine learning technique to analyse the time series signals and identify three features that can represent 5 fingers within 500 ms.
arXiv Detail & Related papers (2023-05-12T17:24:02Z) - Self-Supervised Masked Convolutional Transformer Block for Anomaly
Detection [122.4894940892536]
We present a novel self-supervised masked convolutional transformer block (SSMCTB) that comprises the reconstruction-based functionality at a core architectural level.
In this work, we extend our previous self-supervised predictive convolutional attentive block (SSPCAB) with a 3D masked convolutional layer, a transformer for channel-wise attention, as well as a novel self-supervised objective based on Huber loss.
arXiv Detail & Related papers (2022-09-25T04:56:10Z) - Differentiable Frequency-based Disentanglement for Aerial Video Action
Recognition [56.91538445510214]
We present a learning algorithm for human activity recognition in videos.
Our approach is designed for UAV videos, which are mainly acquired from obliquely placed dynamic cameras.
We conduct extensive experiments on the UAV Human dataset and the NEC Drone dataset.
arXiv Detail & Related papers (2022-09-15T22:16:52Z) - Overcoming the Domain Gap in Neural Action Representations [60.47807856873544]
3D pose data can now be reliably extracted from multi-view video sequences without manual intervention.
We propose to use it to guide the encoding of neural action representations together with a set of neural and behavioral augmentations.
To reduce the domain gap, during training, we swap neural and behavioral data across animals that seem to be performing similar actions.
arXiv Detail & Related papers (2021-12-02T12:45:46Z) - Event and Activity Recognition in Video Surveillance for Cyber-Physical
Systems [0.0]
Long-term motion patterns alone play a pivotal role in the task of recognizing an event.
We show that the long-term motion patterns alone play a pivotal role in the task of recognizing an event.
Only the temporal features are exploited using a hybrid Convolutional Neural Network (CNN) + Recurrent Neural Network (RNN) architecture.
arXiv Detail & Related papers (2021-11-03T08:30:38Z) - Relational Graph Learning on Visual and Kinematics Embeddings for
Accurate Gesture Recognition in Robotic Surgery [84.73764603474413]
We propose a novel online approach of multi-modal graph network (i.e., MRG-Net) to dynamically integrate visual and kinematics information.
The effectiveness of our method is demonstrated with state-of-the-art results on the public JIGSAWS dataset.
arXiv Detail & Related papers (2020-11-03T11:00:10Z) - Complex Human Action Recognition in Live Videos Using Hybrid FR-DL
Method [1.027974860479791]
We address challenges of the preprocessing phase, by an automated selection of representative frames among the input sequences.
We propose a hybrid technique using background subtraction and HOG, followed by application of a deep neural network and skeletal modelling method.
We name our model as Feature Reduction & Deep Learning based action recognition method, or FR-DL in short.
arXiv Detail & Related papers (2020-07-06T15:12:50Z) - Automatic Operating Room Surgical Activity Recognition for
Robot-Assisted Surgery [1.1033115844630357]
We investigate automatic surgical activity recognition in robot-assisted operations.
We collect the first large-scale dataset including 400 full-length multi-perspective videos.
We densely annotate the videos with 10 most recognized and clinically relevant classes of activities.
arXiv Detail & Related papers (2020-06-29T16:30:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.