COBRA: Multimodal Sensing Deep Learning Framework for Remote Chronic Obesity Management via Wrist-Worn Activity Monitoring
- URL: http://arxiv.org/abs/2509.04210v1
- Date: Thu, 04 Sep 2025 13:35:49 GMT
- Title: COBRA: Multimodal Sensing Deep Learning Framework for Remote Chronic Obesity Management via Wrist-Worn Activity Monitoring
- Authors: Zhengyang Shen, Bo Gao, Mayue Shi,
- Abstract summary: This study presents COBRA, a novel deep learning framework for objective behavioral monitoring using wrist-worn multimodal sensors.<n> COBRA integrates a hybrid D-Net architecture combining U-Net spatial modeling, multi-head self-attention mechanisms, and BiLSTM temporal processing to classify daily activities into four obesity-relevant categories.<n>The framework shows robust generalizability with low demographic variance (3%), enabling scalable deployment for personalized obesity interventions and continuous lifestyle monitoring.
- Score: 9.506310924716864
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Chronic obesity management requires continuous monitoring of energy balance behaviors, yet traditional self-reported methods suffer from significant underreporting and recall bias, and difficulty in integration with modern digital health systems. This study presents COBRA (Chronic Obesity Behavioral Recognition Architecture), a novel deep learning framework for objective behavioral monitoring using wrist-worn multimodal sensors. COBRA integrates a hybrid D-Net architecture combining U-Net spatial modeling, multi-head self-attention mechanisms, and BiLSTM temporal processing to classify daily activities into four obesity-relevant categories: Food Intake, Physical Activity, Sedentary Behavior, and Daily Living. Validated on the WISDM-Smart dataset with 51 subjects performing 18 activities, COBRA's optimal preprocessing strategy combines spectral-temporal feature extraction, achieving high performance across multiple architectures. D-Net demonstrates 96.86% overall accuracy with category-specific F1-scores of 98.55% (Physical Activity), 95.53% (Food Intake), 94.63% (Sedentary Behavior), and 98.68% (Daily Living), outperforming state-of-the-art baselines by 1.18% in accuracy. The framework shows robust generalizability with low demographic variance (<3%), enabling scalable deployment for personalized obesity interventions and continuous lifestyle monitoring.
Related papers
- Active Zero: Self-Evolving Vision-Language Models through Active Environment Exploration [72.84714132070404]
We propose a framework that shifts from passive interaction to active exploration of visual environments.<n>Active-Zero employs three co-evolving agents: a Searcher that retrieves images from open-world repositories based on the model's capability frontier.<n>On Qwen2.5-VL-7B-Instruct across 12 benchmarks, Active-Zero 53.97 average accuracy on reasoning tasks (5.7% improvement) and 59.77 on general understanding (3.9% improvement)
arXiv Detail & Related papers (2026-02-11T17:29:17Z) - Effect of Activation Function and Model Optimizer on the Performance of Human Activity Recognition System Using Various Deep Learning Models [0.0]
We investigate the effect of three commonly used activation functions (ReLU, Sigmoid, and Tanh) combined with four optimization algorithms.<n>Experiments are conducted on six medically relevant activity classes selected from the HMDB51 and UCF101 datasets.<n>Results show that ConvLSTM consistently outperforms BiLSTM across both datasets.
arXiv Detail & Related papers (2025-12-23T07:01:45Z) - Human Activity Recognition Based on Electrocardiogram Data Only [5.367301239087641]
We show for the first time, robust recognition of activity only with ECG in six distinct activities.<n>We design and evaluate three new deep learning models, including a CNN classifier with Squeeze-and-Excitation blocks for channel-wise feature recalibration.<n>Tested on data from 54 subjects for six activities, all three models achieve over 94% accuracy for seen subjects, while CNNTransformer hybrid reaches the best accuracy of 72% for unseen subjects.
arXiv Detail & Related papers (2025-09-14T01:26:32Z) - Enhancing Fitness Movement Recognition with Attention Mechanism and Pre-Trained Feature Extractors [1.7619303397097408]
Fitness movement recognition plays a vital role in health monitoring, rehabilitation, and personalized fitness training.<n>We present a framework that integrates pre-trained 2D Convolutional Neural Networks (CNNs) with a Long Short-Term Memory (LSTM) network enhanced by spatial attention.<n>We evaluate the framework on a curated subset of the UCF101 dataset, achieving a peak accuracy of 93.34% with the ResNet50-based configuration.
arXiv Detail & Related papers (2025-09-02T17:04:42Z) - OmniEAR: Benchmarking Agent Reasoning in Embodied Tasks [52.87238755666243]
We present OmniEAR, a framework for evaluating how language models reason about physical interactions, tool usage, and multi-agent coordination in embodied tasks.<n>We model continuous physical properties and complex spatial relationships across 1,500 scenarios spanning household and industrial domains.<n>Our systematic evaluation reveals severe performance degradation when models must reason from constraints.
arXiv Detail & Related papers (2025-08-07T17:54:15Z) - HANS-Net: Hyperbolic Convolution and Adaptive Temporal Attention for Accurate and Generalizable Liver and Tumor Segmentation in CT Imaging [1.3149714289117207]
Accurate liver and tumor segmentation on abdominal CT images is critical for reliable diagnosis and treatment planning.<n>We introduce Hyperbolic-convolutions Adaptive-temporal-attention with Neural-representation and Synaptic-plasticity Network (HANS-Net)<n>HANS-Net combines hyperbolic convolutions for hierarchical geometric representation, a wavelet-inspired decomposition module for multi-scale texture learning, and an implicit neural representation branch.
arXiv Detail & Related papers (2025-07-15T13:56:37Z) - USAD: End-to-End Human Activity Recognition via Diffusion Model with Spatiotemporal Attention [8.061018589897277]
Human activity recognition is a task that finds broad applications in health monitoring, safety protection, and sports analysis.<n>Despite proliferating research, human activity recognition still faces key challenges, including the scarcity of labeled samples for rare activities.<n>This paper proposes a comprehensive optimization approach centered on multi-attention interaction mechanisms.
arXiv Detail & Related papers (2025-07-03T17:38:44Z) - A Comparative Study of Human Activity Recognition: Motion, Tactile, and multi-modal Approaches [43.97520291340696]
This study evaluates the ability of a vision-based tactile sensor to classify 15 activities.<n>We propose a multi-modal framework combining tactile and motion data to leverage their complementary strengths.
arXiv Detail & Related papers (2025-05-13T15:20:21Z) - Skeleton-Based Intake Gesture Detection With Spatial-Temporal Graph Convolutional Networks [1.5228527154365612]
This study introduces a skeleton based approach using a model that combines a dilated spatial-temporal graph convolutional network (ST-GCN) with a bidirectional long-short-term memory (BiLSTM) framework to detect intake gestures.<n>The results confirm the feasibility of utilizing skeleton data for intake gesture detection and highlight the robustness of the proposed approach in cross-dataset validation.
arXiv Detail & Related papers (2025-04-14T18:35:32Z) - Scaling Wearable Foundation Models [54.93979158708164]
We investigate the scaling properties of sensor foundation models across compute, data, and model size.
Using a dataset of up to 40 million hours of in-situ heart rate, heart rate variability, electrodermal activity, accelerometer, skin temperature, and altimeter per-minute data from over 165,000 people, we create LSM.
Our results establish the scaling laws of LSM for tasks such as imputation, extrapolation, both across time and sensor modalities.
arXiv Detail & Related papers (2024-10-17T15:08:21Z) - Analyzing Participants' Engagement during Online Meetings Using Unsupervised Remote Photoplethysmography with Behavioral Features [50.82725748981231]
Engagement measurement finds application in healthcare, education, services.
Use of physiological and behavioral features is viable, but impracticality of traditional physiological measurement arises due to the need for contact sensors.
We demonstrate the feasibility of the unsupervised photoplethysmography (rmography) as an alternative for contact sensors.
arXiv Detail & Related papers (2024-04-05T20:39:16Z) - Deep Reinforcement Learning Empowered Activity-Aware Dynamic Health
Monitoring Systems [69.41229290253605]
Existing monitoring approaches were designed on the premise that medical devices track several health metrics concurrently.
This means that they report all relevant health values within that scope, which can result in excess resource use and the gathering of extraneous data.
We propose Dynamic Activity-Aware Health Monitoring strategy (DActAHM) for striking a balance between optimal monitoring performance and cost efficiency.
arXiv Detail & Related papers (2024-01-19T16:26:35Z) - Continuous Decoding of Daily-Life Hand Movements from Forearm Muscle
Activity for Enhanced Myoelectric Control of Hand Prostheses [78.120734120667]
We introduce a novel method, based on a long short-term memory (LSTM) network, to continuously map forearm EMG activity onto hand kinematics.
Ours is the first reported work on the prediction of hand kinematics that uses this challenging dataset.
Our results suggest that the presented method is suitable for the generation of control signals for the independent and proportional actuation of the multiple DOFs of state-of-the-art hand prostheses.
arXiv Detail & Related papers (2021-04-29T00:11:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.