Related papers: SensorLLM: Aligning Large Language Models with Motion Sensors for Human Activity Recognition

SensorLLM: Aligning Large Language Models with Motion Sensors for Human Activity Recognition

URL: http://arxiv.org/abs/2410.10624v2
Date: Mon, 17 Mar 2025 09:28:43 GMT
Title: SensorLLM: Aligning Large Language Models with Motion Sensors for Human Activity Recognition
Authors: Zechen Li, Shohreh Deldari, Linyao Chen, Hao Xue, Flora D. Salim,
Abstract summary: We introduce SensorLLM, a framework that enables Large Language Models (LLMs) to perform human activity recognition (HAR) from sensor data.<n> SensorLLM addresses limitations through a Sensor-Language Alignment stage, where we introduce special tokens for each sensor channel.<n>In the subsequent Task-Aware Tuning stage, we refine the model for HAR classification, achieving performance that matches or surpasses state-of-the-art methods.
Score: 9.072495000412943
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce SensorLLM, a two-stage framework that enables Large Language Models (LLMs) to perform human activity recognition (HAR) from sensor data. Despite their strong reasoning and generalization capabilities, LLMs remain underutilized for motion sensor data due to the lack of semantic context in time-series, computational constraints, and challenges in processing numerical inputs. SensorLLM addresses these limitations through a Sensor-Language Alignment stage, where we introduce special tokens for each sensor channel and automatically generate textual trend descriptions. This alignment enables LLMs to capture numerical variations, channel-specific features, and data of varying duration--without requiring human annotations. In the subsequent Task-Aware Tuning stage, we refine the model for HAR classification, achieving performance that matches or surpasses state-of-the-art methods. Our results demonstrate that SensorLLM evolves into an effective sensor learner, reasoner, and classifier through Sensor-Language Alignment, generalizing across diverse HAR datasets. We believe this work establishes a foundation for future research on time-series and text alignment, paving the way for foundation models in sensor data analysis.

Related papers

Gensors: Authoring Personalized Visual Sensors with Multimodal Foundation Models and Reasoning [61.17099595835263]
Gensors is a system that empowers users to define customized sensors supported by the reasoning capabilities of MLLMs. In a user study, participants reported significantly greater sense of control, understanding, and ease of communication when defining sensors using Gensors.
arXiv Detail & Related papers (2025-01-27T01:47:57Z)
MSSIDD: A Benchmark for Multi-Sensor Denoising [55.41612200877861]
We introduce a new benchmark, the Multi-Sensor SIDD dataset, which is the first raw-domain dataset designed to evaluate the sensor transferability of denoising models. We propose a sensor consistency training framework that enables denoising models to learn the sensor-invariant features.
arXiv Detail & Related papers (2024-11-18T13:32:59Z)
Scaling Wearable Foundation Models [54.93979158708164]
We investigate the scaling properties of sensor foundation models across compute, data, and model size. Using a dataset of up to 40 million hours of in-situ heart rate, heart rate variability, electrodermal activity, accelerometer, skin temperature, and altimeter per-minute data from over 165,000 people, we create LSM. Our results establish the scaling laws of LSM for tasks such as imputation, extrapolation, both across time and sensor modalities.
arXiv Detail & Related papers (2024-10-17T15:08:21Z)
SensorBench: Benchmarking LLMs in Coding-Based Sensor Processing [6.8009140511761546]
Large Language Models (LLMs) have promising capabilities in processing sensory data, suggesting their potential as copilots for developing sensing systems. We construct a comprehensive benchmark, SensorBench, to establish a quantifiable objective. The results show that while LLMs exhibit considerable proficiency in simpler tasks, they face inherent challenges in processing compositional tasks.
arXiv Detail & Related papers (2024-10-14T17:21:39Z)
Language-centered Human Activity Recognition [8.925867647929088]
Human Activity Recognition (HAR) using Inertial Measurement Unit (IMU) sensors is critical for applications in healthcare, safety, and industrial production. variation in activity patterns, device types, and sensor placements create distribution gaps across datasets. We propose LanHAR, a novel system that generates semantic interpretations of sensor readings and activity labels for cross-dataset HAR.
arXiv Detail & Related papers (2024-09-12T22:57:29Z)
By My Eyes: Grounding Multimodal Large Language Models with Sensor Data via Visual Prompting [24.39281384670957]
We propose a visual prompting approach for sensor data using multimodal large language models (MLLMs) We design a visual prompt that directs MLLMs to utilize visualized sensor data alongside the target sensory task descriptions. We evaluate our approach on nine sensory tasks involving four sensing modalities, achieving an average of 10% higher accuracy than text-based prompts.
arXiv Detail & Related papers (2024-07-15T01:33:54Z)
Layout Agnostic Human Activity Recognition in Smart Homes through Textual Descriptions Of Sensor Triggers (TDOST) [0.22354214294493352]
We develop a layout-agnostic modeling approach for human activity recognition (HAR) systems in smart homes. We generate Textual Descriptions Of Sensor Triggers (TDOST) that encapsulate the surrounding trigger conditions. We demonstrate the effectiveness of TDOST-based models in unseen smart homes through experiments on benchmarked CASAS datasets.
arXiv Detail & Related papers (2024-05-20T20:37:44Z)
LLMSense: Harnessing LLMs for High-level Reasoning Over Spatiotemporal Sensor Traces [1.1137304094345333]
We design an effective prompting framework for Large Language Models (LLMs) on high-level reasoning tasks. We also design two strategies to enhance performance with long sensor traces, including summarization before reasoning and selective inclusion of historical traces. Our framework can be implemented in an edge-cloud setup, running small LLMs on the edge for data summarization and performing high-level reasoning on the cloud for privacy preservation.
arXiv Detail & Related papers (2024-03-28T22:06:04Z)
Evaluating Large Language Models as Virtual Annotators for Time-series Physical Sensing Data [5.092345761847645]
We study whether the state-of-the-art (SOTA) LLMs can be used as virtual annotators for labeling time-series physical sensing data.
arXiv Detail & Related papers (2024-03-02T08:29:08Z)
MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World [55.878173953175356]
We propose MultiPLY, a multisensory embodied large language model. We first collect Multisensory Universe, a large-scale multisensory interaction dataset comprising 500k data. We demonstrate that MultiPLY outperforms baselines by a large margin through a diverse set of embodied tasks.
arXiv Detail & Related papers (2024-01-16T18:59:45Z)
MuSR: Testing the Limits of Chain-of-thought with Multistep Soft Reasoning [63.80739044622555]
We introduce MuSR, a dataset for evaluating language models on soft reasoning tasks specified in a natural language narrative. This dataset has two crucial features. First, it is created through a novel neurosymbolic synthetic-to-natural generation algorithm. Second, our dataset instances are free text narratives corresponding to real-world domains of reasoning.
arXiv Detail & Related papers (2023-10-24T17:59:20Z)
A Real-time Human Pose Estimation Approach for Optimal Sensor Placement in Sensor-based Human Activity Recognition [63.26015736148707]
This paper introduces a novel methodology to resolve the issue of optimal sensor placement for Human Activity Recognition. The derived skeleton data provides a unique strategy for identifying the optimal sensor location. Our findings indicate that the vision-based method for sensor placement offers comparable results to the conventional deep learning approach.
arXiv Detail & Related papers (2023-07-06T10:38:14Z)
Unsupervised Statistical Feature-Guided Diffusion Model for Sensor-based Human Activity Recognition [3.2319909486685354]
A key problem holding up progress in wearable sensor-based human activity recognition is the unavailability of diverse and labeled training data. We propose an unsupervised statistical feature-guided diffusion model specifically optimized for wearable sensor-based human activity recognition. By conditioning the diffusion model on statistical information such as mean, standard deviation, Z-score, and skewness, we generate diverse and representative synthetic sensor data.
arXiv Detail & Related papers (2023-05-30T15:12:59Z)
Bayesian Imitation Learning for End-to-End Mobile Manipulation [80.47771322489422]
Augmenting policies with additional sensor inputs, such as RGB + depth cameras, is a straightforward approach to improving robot perception capabilities. We show that using the Variational Information Bottleneck to regularize convolutional neural networks improves generalization to held-out domains. We demonstrate that our method is able to help close the sim-to-real gap and successfully fuse RGB and depth modalities.
arXiv Detail & Related papers (2022-02-15T17:38:30Z)
Semantics-aware Adaptive Knowledge Distillation for Sensor-to-Vision Action Recognition [131.6328804788164]
We propose a framework, named Semantics-aware Adaptive Knowledge Distillation Networks (SAKDN), to enhance action recognition in vision-sensor modality (videos) The SAKDN uses multiple wearable-sensors as teacher modalities and uses RGB videos as student modality.
arXiv Detail & Related papers (2020-09-01T03:38:31Z)
Deep Soft Procrustes for Markerless Volumetric Sensor Alignment [81.13055566952221]
In this work, we improve markerless data-driven correspondence estimation to achieve more robust multi-sensor spatial alignment. We incorporate geometric constraints in an end-to-end manner into a typical segmentation based model and bridge the intermediate dense classification task with the targeted pose estimation one. Our model is experimentally shown to achieve similar results with marker-based methods and outperform the markerless ones, while also being robust to the pose variations of the calibration structure.
arXiv Detail & Related papers (2020-03-23T10:51:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.