TxP: Reciprocal Generation of Ground Pressure Dynamics and Activity Descriptions for Improving Human Activity Recognition
- URL: http://arxiv.org/abs/2505.02052v1
- Date: Sun, 04 May 2025 10:07:38 GMT
- Title: TxP: Reciprocal Generation of Ground Pressure Dynamics and Activity Descriptions for Improving Human Activity Recognition
- Authors: Lala Shakti Swarup Ray, Lars Krupp, Vitor Fortes Rey, Bo Zhou, Sungho Suh, Paul Lukowicz,
- Abstract summary: We present a Text$times$Pressure model that uses generative foundation models to interpret pressure data as natural language.<n> TxP is trained on our synthetic PressLang dataset, containing over 81,100 text-pressure pairs.<n>This improved HAR performance by up to 12.4% in macro F1 score compared to the state-of-the-art.
- Score: 4.249657064343807
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Sensor-based human activity recognition (HAR) has predominantly focused on Inertial Measurement Units and vision data, often overlooking the capabilities unique to pressure sensors, which capture subtle body dynamics and shifts in the center of mass. Despite their potential for postural and balance-based activities, pressure sensors remain underutilized in the HAR domain due to limited datasets. To bridge this gap, we propose to exploit generative foundation models with pressure-specific HAR techniques. Specifically, we present a bidirectional Text$\times$Pressure model that uses generative foundation models to interpret pressure data as natural language. TxP accomplishes two tasks: (1) Text2Pressure, converting activity text descriptions into pressure sequences, and (2) Pressure2Text, generating activity descriptions and classifications from dynamic pressure maps. Leveraging pre-trained models like CLIP and LLaMA 2 13B Chat, TxP is trained on our synthetic PressLang dataset, containing over 81,100 text-pressure pairs. Validated on real-world data for activities such as yoga and daily tasks, TxP provides novel approaches to data augmentation and classification grounded in atomic actions. This consequently improved HAR performance by up to 12.4\% in macro F1 score compared to the state-of-the-art, advancing pressure-based HAR with broader applications and deeper insights into human movement.
Related papers
- PIM: Physics-Informed Multi-task Pre-training for Improving Inertial Sensor-Based Human Activity Recognition [4.503003860563811]
We propose a physics-informed multi-task pre-training (PIM) framework for IMU-based human activity recognition (HAR)<n>PIM generates pre-text tasks based on the understanding of basic physical aspects of human motion.<n>We have observed gains of almost 10% in macro f1 score and accuracy with only 2 to 8 labeled examples per class.
arXiv Detail & Related papers (2025-03-23T08:16:01Z) - Predicting Stock Movement with BERTweet and Transformers [0.0]
In this paper, we demonstrate the efficacy of BERTweet, a variant of BERT pre-trained specifically on a Twitter corpus.<n>We set a new baseline for Matthews Correlation Coefficient on the Stocknet dataset without auxiliary data sources.
arXiv Detail & Related papers (2025-03-13T23:46:24Z) - Static for Dynamic: Towards a Deeper Understanding of Dynamic Facial Expressions Using Static Expression Data [83.48170683672427]
We propose a unified dual-modal learning framework that integrates SFER data as a complementary resource for DFER.<n>S4D employs dual-modal self-supervised pre-training on facial images and videos using a shared Transformer (ViT) encoder-decoder architecture.<n>Experiments demonstrate that S4D achieves a deeper understanding of DFER, setting new state-of-the-art performance.
arXiv Detail & Related papers (2024-09-10T01:57:57Z) - EgoPressure: A Dataset for Hand Pressure and Pose Estimation in Egocentric Vision [69.1005706608681]
EgoPressure is a novel egocentric dataset that captures detailed touch contact and pressure interactions.<n>Our dataset comprises 5 hours of recorded interactions from 21 participants captured simultaneously by one head-mounted and seven stationary Kinect cameras.
arXiv Detail & Related papers (2024-09-03T18:53:32Z) - Text me the data: Generating Ground Pressure Sequence from Textual
Descriptions for HAR [4.503003860563811]
Text-to-Pressure (T2P) is a framework designed to generate ground pressure sequences from textual descriptions.
We show that the combination of vector quantization of sensor data along with simple text conditioned auto regressive strategy allows us to obtain high-quality generated pressure sequences.
arXiv Detail & Related papers (2024-02-22T10:14:59Z) - Text2Data: Low-Resource Data Generation with Textual Control [100.5970757736845]
Text2Data is a novel approach that utilizes unlabeled data to understand the underlying data distribution.<n>It undergoes finetuning via a novel constraint optimization-based learning objective that ensures controllability and effectively counteracts catastrophic forgetting.
arXiv Detail & Related papers (2024-02-08T03:41:39Z) - Harnessing Diffusion Models for Visual Perception with Meta Prompts [68.78938846041767]
We propose a simple yet effective scheme to harness a diffusion model for visual perception tasks.
We introduce learnable embeddings (meta prompts) to the pre-trained diffusion models to extract proper features for perception.
Our approach achieves new performance records in depth estimation tasks on NYU depth V2 and KITTI, and in semantic segmentation task on CityScapes.
arXiv Detail & Related papers (2023-12-22T14:40:55Z) - Approximating Human-Like Few-shot Learning with GPT-based Compression [55.699707962017975]
We seek to equip generative pre-trained models with human-like learning capabilities that enable data compression during inference.
We present a novel approach that utilizes the Generative Pre-trained Transformer (GPT) to approximate Kolmogorov complexity.
arXiv Detail & Related papers (2023-08-14T05:22:33Z) - PressureTransferNet: Human Attribute Guided Dynamic Ground Pressure
Profile Transfer using 3D simulated Pressure Maps [7.421780713537146]
PressureTransferNet is an encoder-decoder model taking a source pressure map and a target human attribute vector as inputs.
We use a sensor simulation to create a diverse dataset with various human attributes and pressure profiles.
We visually confirm the fidelity of the synthesized pressure shapes using a physics-based deep learning model and achieve a binary R-square value of 0.79 on areas with ground contact.
arXiv Detail & Related papers (2023-08-01T13:31:25Z) - PresSim: An End-to-end Framework for Dynamic Ground Pressure Profile
Generation from Monocular Videos Using Physics-based 3D Simulation [8.107762252448195]
Ground pressure exerted by the human body is a valuable source of information for human activity recognition (HAR) in pervasive sensing.
We present a novel end-to-end framework, PresSim, to synthesize sensor data from videos of human activities to reduce such effort significantly.
arXiv Detail & Related papers (2023-02-01T12:02:04Z) - TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild [77.59069361196404]
TRiPOD is a novel method for predicting body dynamics based on graph attentional networks.
To incorporate a real-world challenge, we learn an indicator representing whether an estimated body joint is visible/invisible at each frame.
Our evaluation shows that TRiPOD outperforms all prior work and state-of-the-art specifically designed for each of the trajectory and pose forecasting tasks.
arXiv Detail & Related papers (2021-04-08T20:01:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.