BabyNet: A Lightweight Network for Infant Reaching Action Recognition in
Unconstrained Environments to Support Future Pediatric Rehabilitation
Applications
- URL: http://arxiv.org/abs/2208.04950v1
- Date: Tue, 9 Aug 2022 07:38:36 GMT
- Title: BabyNet: A Lightweight Network for Infant Reaching Action Recognition in
Unconstrained Environments to Support Future Pediatric Rehabilitation
Applications
- Authors: Amel Dechemi, Vikarn Bhakri, Ipsita Sahin, Arjun Modi, Julya Mestas,
Pamodya Peiris, Dannya Enriquez Barrundia, Elena Kokkoni, and Konstantinos
Karydis
- Abstract summary: Action recognition is an important component to improve autonomy of physical rehabilitation devices, such as wearable robotic exoskeletons.
In this paper, we introduce BabyNet, a light-weight (in terms of trainable parameters) network structure to recognize infant reaching action from off-body stationary cameras.
- Score: 5.4771139749266435
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Action recognition is an important component to improve autonomy of physical
rehabilitation devices, such as wearable robotic exoskeletons. Existing human
action recognition algorithms focus on adult applications rather than pediatric
ones. In this paper, we introduce BabyNet, a light-weight (in terms of
trainable parameters) network structure to recognize infant reaching action
from off-body stationary cameras. We develop an annotated dataset that includes
diverse reaches performed while in a sitting posture by different infants in
unconstrained environments (e.g., in home settings, etc.). Our approach uses
the spatial and temporal connection of annotated bounding boxes to interpret
onset and offset of reaching, and to detect a complete reaching action. We
evaluate the efficiency of our proposed approach and compare its performance
against other learning-based network structures in terms of capability of
capturing temporal inter-dependencies and accuracy of detection of reaching
onset and offset. Results indicate our BabyNet can attain solid performance in
terms of (average) testing accuracy that exceeds that of other larger networks,
and can hence serve as a light-weight data-driven framework for video-based
infant reaching action recognition.
Related papers
- Challenges in Video-Based Infant Action Recognition: A Critical
Examination of the State of the Art [9.327466428403916]
We introduce a groundbreaking dataset called InfActPrimitive'', encompassing five significant infant milestone action categories.
We conduct an extensive comparative analysis employing cutting-edge skeleton-based action recognition models.
Our findings reveal that, although the PoseC3D model achieves the highest accuracy at approximately 71%, the remaining models struggle to accurately capture the dynamics of infant actions.
arXiv Detail & Related papers (2023-11-21T02:36:47Z) - Adaptive Local-Component-aware Graph Convolutional Network for One-shot
Skeleton-based Action Recognition [54.23513799338309]
We present an Adaptive Local-Component-aware Graph Convolutional Network for skeleton-based action recognition.
Our method provides a stronger representation than the global embedding and helps our model reach state-of-the-art.
arXiv Detail & Related papers (2022-09-21T02:33:07Z) - Vision-Based Activity Recognition in Children with Autism-Related
Behaviors [15.915410623440874]
We demonstrate the effect of a region-based computer vision system to help clinicians and parents analyze a child's behavior.
The data is pre-processed by detecting the target child in the video to reduce the impact of background noise.
Motivated by the effectiveness of temporal convolutional models, we propose both light-weight and conventional models capable of extracting action features from video frames.
arXiv Detail & Related papers (2022-08-08T15:12:27Z) - Real-time landmark detection for precise endoscopic submucosal
dissection via shape-aware relation network [51.44506007844284]
We propose a shape-aware relation network for accurate and real-time landmark detection in endoscopic submucosal dissection surgery.
We first devise an algorithm to automatically generate relation keypoint heatmaps, which intuitively represent the prior knowledge of spatial relations among landmarks.
We then develop two complementary regularization schemes to progressively incorporate the prior knowledge into the training process.
arXiv Detail & Related papers (2021-11-08T07:57:30Z) - Skeleton-Based Mutually Assisted Interacted Object Localization and
Human Action Recognition [111.87412719773889]
We propose a joint learning framework for "interacted object localization" and "human action recognition" based on skeleton data.
Our method achieves the best or competitive performance with the state-of-the-art methods for human action recognition.
arXiv Detail & Related papers (2021-10-28T10:09:34Z) - Kinship Verification Based on Cross-Generation Feature Interaction
Learning [53.62256887837659]
Kinship verification from facial images has been recognized as an emerging yet challenging technique in computer vision applications.
We propose a novel cross-generation feature interaction learning (CFIL) framework for robust kinship verification.
arXiv Detail & Related papers (2021-09-07T01:50:50Z) - Joint Learning of Neural Transfer and Architecture Adaptation for Image
Recognition [77.95361323613147]
Current state-of-the-art visual recognition systems rely on pretraining a neural network on a large-scale dataset and finetuning the network weights on a smaller dataset.
In this work, we prove that dynamically adapting network architectures tailored for each domain task along with weight finetuning benefits in both efficiency and effectiveness.
Our method can be easily generalized to an unsupervised paradigm by replacing supernet training with self-supervised learning in the source domain tasks and performing linear evaluation in the downstream tasks.
arXiv Detail & Related papers (2021-03-31T08:15:17Z) - A Deep Learning Approach to Tongue Detection for Pediatric Population [1.5484595752241122]
Children with severe disabilities and complex communication needs face limitations in the usage of access technology (AT) devices.
Previous studies have shown the robustness of tongue detection algorithms on adult participants.
In this study, a network architecture for tongue-out gesture recognition was implemented and evaluated on videos recorded in a naturalistic setting.
arXiv Detail & Related papers (2020-09-04T21:04:57Z) - Preterm infants' pose estimation with spatio-temporal features [7.054093620465401]
This paper introduces the use of preterm-temporal features for limb detection and tracking.
It is the first study to use depth videos acquired in the actual clinical practice for limb-pose estimation.
arXiv Detail & Related papers (2020-05-08T09:51:22Z) - Minor Privacy Protection Through Real-time Video Processing at the Edge [4.4243708797335115]
In this paper, we investigate lightweight solutions that are affordable to edge surveillance systems.
A pipeline extracts faces from the input frames and classifies each one to be of an adult or a child.
We show the superiority of our proposed model with an accuracy of 92.1% in classification compared to some other face recognition based child detection approaches.
arXiv Detail & Related papers (2020-05-03T20:19:15Z) - Dynamic Inference: A New Approach Toward Efficient Video Action
Recognition [69.9658249941149]
Action recognition in videos has achieved great success recently, but it remains a challenging task due to the massive computational cost.
We propose a general dynamic inference idea to improve inference efficiency by leveraging the variation in the distinguishability of different videos.
arXiv Detail & Related papers (2020-02-09T11:09:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.