Skeleton-Based Intake Gesture Detection With Spatial-Temporal Graph Convolutional Networks
- URL: http://arxiv.org/abs/2504.10635v1
- Date: Mon, 14 Apr 2025 18:35:32 GMT
- Title: Skeleton-Based Intake Gesture Detection With Spatial-Temporal Graph Convolutional Networks
- Authors: Chunzhuo Wang, Zhewen Xue, T. Sunil Kumar, Guido Camps, Hans Hallez, Bart Vanrumste,
- Abstract summary: This study introduces a skeleton based approach using a model that combines a dilated spatial-temporal graph convolutional network (ST-GCN) with a bidirectional long-short-term memory (BiLSTM) framework to detect intake gestures.<n>The results confirm the feasibility of utilizing skeleton data for intake gesture detection and highlight the robustness of the proposed approach in cross-dataset validation.
- Score: 1.5228527154365612
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Overweight and obesity have emerged as widespread societal challenges, frequently linked to unhealthy eating patterns. A promising approach to enhance dietary monitoring in everyday life involves automated detection of food intake gestures. This study introduces a skeleton based approach using a model that combines a dilated spatial-temporal graph convolutional network (ST-GCN) with a bidirectional long-short-term memory (BiLSTM) framework, as called ST-GCN-BiLSTM, to detect intake gestures. The skeleton-based method provides key benefits, including environmental robustness, reduced data dependency, and enhanced privacy preservation. Two datasets were employed for model validation. The OREBA dataset, which consists of laboratory-recorded videos, achieved segmental F1-scores of 86.18% and 74.84% for identifying eating and drinking gestures. Additionally, a self-collected dataset using smartphone recordings in more adaptable experimental conditions was evaluated with the model trained on OREBA, yielding F1-scores of 85.40% and 67.80% for detecting eating and drinking gestures. The results not only confirm the feasibility of utilizing skeleton data for intake gesture detection but also highlight the robustness of the proposed approach in cross-dataset validation.
Related papers
- Exploring FMCW Radars and Feature Maps for Activity Recognition: A Benchmark Study [2.251010251400407]
This study introduces a Frequency-Modulated Continuous Wave radar-based framework for human activity recognition.<n>Unlike conventional approaches that process feature maps as images, this study feeds multi-dimensional feature maps as data vectors.<n>The ConvLSTM model outperformed conventional machine learning and deep learning models, achieving an accuracy of 90.51%.
arXiv Detail & Related papers (2025-03-07T17:53:29Z) - SMILE-UHURA Challenge -- Small Vessel Segmentation at Mesoscopic Scale from Ultra-High Resolution 7T Magnetic Resonance Angiograms [60.35639972035727]
The lack of publicly available annotated datasets has impeded the development of robust, machine learning-driven segmentation algorithms.
The SMILE-UHURA challenge addresses the gap in publicly available annotated datasets by providing an annotated dataset of Time-of-Flight angiography acquired with 7T MRI.
Dice scores reached up to 0.838 $pm$ 0.066 and 0.716 $pm$ 0.125 on the respective datasets, with an average performance of up to 0.804 $pm$ 0.15.
arXiv Detail & Related papers (2024-11-14T17:06:00Z) - Hand Gesture Classification on Praxis Dataset: Trading Accuracy for
Expense [0.6390468088226495]
We focus on'skeletal' data represented by the body joint coordinates, from the Praxis dataset.
The PRAXIS dataset contains recordings of patients with cortical pathologies such as Alzheimer's disease.
Using a combination of windowing techniques with deep learning architecture such as a Recurrent Neural Network (RNN), we achieved an overall accuracy of 70.8%.
arXiv Detail & Related papers (2023-11-01T18:18:09Z) - A Federated Learning Framework for Stenosis Detection [70.27581181445329]
This study explores the use of Federated Learning (FL) for stenosis detection in coronary angiography images (CA)
Two heterogeneous datasets from two institutions were considered: dataset 1 includes 1219 images from 200 patients, which we acquired at the Ospedale Riuniti of Ancona (Italy)
dataset 2 includes 7492 sequential images from 90 patients from a previous study available in the literature.
arXiv Detail & Related papers (2023-10-30T11:13:40Z) - Domain Adaptive Synapse Detection with Weak Point Annotations [63.97144211520869]
We present AdaSyn, a framework for domain adaptive synapse detection with weak point annotations.
In the WASPSYN challenge at I SBI 2023, our method ranks the 1st place.
arXiv Detail & Related papers (2023-08-31T05:05:53Z) - Beyond Individual Input for Deep Anomaly Detection on Tabular Data [0.0]
Anomaly detection is vital in many domains, such as finance, healthcare, and cybersecurity.
To the best of our knowledge, this is the first work to successfully combine feature-feature and sample-sample dependencies.
Our method achieves state-of-the-art performance, outperforming existing methods by 2.4% and 1.2% in terms of F1-score and AUROC, respectively.
arXiv Detail & Related papers (2023-05-24T13:13:26Z) - A Meta-GNN approach to personalized seizure detection and classification [53.906130332172324]
We propose a personalized seizure detection and classification framework that quickly adapts to a specific patient from limited seizure samples.
We train a Meta-GNN based classifier that learns a global model from a set of training patients.
We show that our method outperforms the baselines by reaching 82.7% on accuracy and 82.08% on F1 score after only 20 iterations on new unseen patients.
arXiv Detail & Related papers (2022-11-01T14:12:58Z) - Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based
Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage.
We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets.
By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z) - Joint-bone Fusion Graph Convolutional Network for Semi-supervised
Skeleton Action Recognition [65.78703941973183]
We propose a novel correlation-driven joint-bone fusion graph convolutional network (CD-JBF-GCN) as an encoder and use a pose prediction head as a decoder.
Specifically, the CD-JBF-GC can explore the motion transmission between the joint stream and the bone stream.
The pose prediction based auto-encoder in the self-supervised training stage allows the network to learn motion representation from unlabeled data.
arXiv Detail & Related papers (2022-02-08T16:03:15Z) - A Data Driven End-to-end Approach for In-the-wild Monitoring of Eating
Behavior Using Smartwatches [8.257740966456172]
This paper presents a complete framework towards the automated i) modeling of in-meal eating behavior and ii) temporal localization of meals.
We present an end-to-end Neural Network which detects food intake events (i.e. bites)
We show how the distribution of the detected bites throughout the day can be used to estimate the start and end points of meals, using signal processing algorithms.
arXiv Detail & Related papers (2020-10-12T12:35:56Z) - Proximity-Based Active Learning on Streaming Data: A Personalized Eating
Moment Recognition [17.961752949636306]
We propose Proximity-based Active Learning on Streaming data, a novel proximity-based model for recognizing eating gestures.
Our analysis on data collected in both controlled and uncontrolled settings indicates that the F-score of PLAS ranges from 22% to 39% for a budget that varies from 10 to 60 query.
arXiv Detail & Related papers (2020-03-29T18:17:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.