SensorLLM: Human-Intuitive Alignment of Multivariate Sensor Data with LLMs for Activity Recognition
- URL: http://arxiv.org/abs/2410.10624v3
- Date: Tue, 20 May 2025 17:16:44 GMT
- Title: SensorLLM: Human-Intuitive Alignment of Multivariate Sensor Data with LLMs for Activity Recognition
- Authors: Zechen Li, Shohreh Deldari, Linyao Chen, Hao Xue, Flora D. Salim,
- Abstract summary: We introduce SensorLLM, a framework that enables Large Language Models (LLMs) to perform human activity recognition (HAR) from wearable sensor data.<n>We construct SensorQA, a question-answering dataset of human-intuitive sensor-text pairs spanning diverse HAR scenarios.<n>Our results show that, guided by human-intuitive alignment, SensorLLM becomes an effective sensor learner, reasoner, and classifier--generalizing across varied HAR settings.
- Score: 9.072495000412943
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce SensorLLM, a two-stage framework that enables Large Language Models (LLMs) to perform human activity recognition (HAR) from wearable sensor data. While LLMs excel at reasoning and generalization, they struggle with time-series inputs due to limited semantic context, numerical complexity, and sequence variability. To address these challenges, we construct SensorQA, a question-answering dataset of human-intuitive sensor-text pairs spanning diverse HAR scenarios. It supervises the Sensor-Language Alignment stage, where the model aligns sensor inputs with trend descriptions. Special tokens are introduced to mark channel boundaries. This alignment enables LLMs to interpret numerical patterns, channel-specific signals, and variable-length inputs--without requiring human annotation. In the subsequent Task-Aware Tuning stage, we adapt the model for multivariate HAR classification, achieving performance that matches or exceeds state-of-the-art methods. Our results show that, guided by human-intuitive alignment, SensorLLM becomes an effective sensor learner, reasoner, and classifier--generalizing across varied HAR settings and paving the way for foundation model research in time-series analysis.
Related papers
- SMARTIES: Spectrum-Aware Multi-Sensor Auto-Encoder for Remote Sensing Images [17.304106221330414]
Recent deep learning models are often specific to single sensors or to fixed combinations.<n>We introduce SMARTIES, a generic and versatile foundation model lifting sensor-specific/dependent efforts.<n>On both single and multi-modal tasks across diverse sensors, SMARTIES outperforms previous models that rely on sensor-specific pretraining.
arXiv Detail & Related papers (2025-06-24T12:51:39Z) - SensorLM: Learning the Language of Wearable Sensors [50.95988682423808]
We present SensorLM, a family of sensor-language foundation models that enable wearable sensor data understanding with natural language.<n>We introduce a hierarchical caption generation pipeline designed to capture statistical, structural, and semantic information from sensor data.<n>This approach enabled the curation of the largest sensor-language dataset to date, comprising over 59.7 million hours of data from more than 103,000 people.
arXiv Detail & Related papers (2025-06-10T17:13:09Z) - Gensors: Authoring Personalized Visual Sensors with Multimodal Foundation Models and Reasoning [61.17099595835263]
Gensors is a system that empowers users to define customized sensors supported by the reasoning capabilities of MLLMs.
In a user study, participants reported significantly greater sense of control, understanding, and ease of communication when defining sensors using Gensors.
arXiv Detail & Related papers (2025-01-27T01:47:57Z) - MSSIDD: A Benchmark for Multi-Sensor Denoising [55.41612200877861]
We introduce a new benchmark, the Multi-Sensor SIDD dataset, which is the first raw-domain dataset designed to evaluate the sensor transferability of denoising models.
We propose a sensor consistency training framework that enables denoising models to learn the sensor-invariant features.
arXiv Detail & Related papers (2024-11-18T13:32:59Z) - Scaling Wearable Foundation Models [54.93979158708164]
We investigate the scaling properties of sensor foundation models across compute, data, and model size.
Using a dataset of up to 40 million hours of in-situ heart rate, heart rate variability, electrodermal activity, accelerometer, skin temperature, and altimeter per-minute data from over 165,000 people, we create LSM.
Our results establish the scaling laws of LSM for tasks such as imputation, extrapolation, both across time and sensor modalities.
arXiv Detail & Related papers (2024-10-17T15:08:21Z) - SensorBench: Benchmarking LLMs in Coding-Based Sensor Processing [6.8009140511761546]
Large Language Models (LLMs) have promising capabilities in processing sensory data, suggesting their potential as copilots for developing sensing systems.
We construct a comprehensive benchmark, SensorBench, to establish a quantifiable objective.
The results show that while LLMs exhibit considerable proficiency in simpler tasks, they face inherent challenges in processing compositional tasks.
arXiv Detail & Related papers (2024-10-14T17:21:39Z) - Language-centered Human Activity Recognition [8.925867647929088]
Human Activity Recognition (HAR) using Inertial Measurement Unit (IMU) sensors is critical for applications in healthcare, safety, and industrial production.
variation in activity patterns, device types, and sensor placements create distribution gaps across datasets.
We propose LanHAR, a novel system that generates semantic interpretations of sensor readings and activity labels for cross-dataset HAR.
arXiv Detail & Related papers (2024-09-12T22:57:29Z) - By My Eyes: Grounding Multimodal Large Language Models with Sensor Data via Visual Prompting [24.39281384670957]
We propose a visual prompting approach for sensor data using multimodal large language models (MLLMs)
We design a visual prompt that directs MLLMs to utilize visualized sensor data alongside the target sensory task descriptions.
We evaluate our approach on nine sensory tasks involving four sensing modalities, achieving an average of 10% higher accuracy than text-based prompts.
arXiv Detail & Related papers (2024-07-15T01:33:54Z) - Layout Agnostic Human Activity Recognition in Smart Homes through Textual Descriptions Of Sensor Triggers (TDOST) [0.22354214294493352]
We develop a layout-agnostic modeling approach for human activity recognition (HAR) systems in smart homes.
We generate Textual Descriptions Of Sensor Triggers (TDOST) that encapsulate the surrounding trigger conditions.
We demonstrate the effectiveness of TDOST-based models in unseen smart homes through experiments on benchmarked CASAS datasets.
arXiv Detail & Related papers (2024-05-20T20:37:44Z) - LLMSense: Harnessing LLMs for High-level Reasoning Over Spatiotemporal Sensor Traces [1.1137304094345333]
We design an effective prompting framework for Large Language Models (LLMs) on high-level reasoning tasks.
We also design two strategies to enhance performance with long sensor traces, including summarization before reasoning and selective inclusion of historical traces.
Our framework can be implemented in an edge-cloud setup, running small LLMs on the edge for data summarization and performing high-level reasoning on the cloud for privacy preservation.
arXiv Detail & Related papers (2024-03-28T22:06:04Z) - Evaluating Large Language Models as Virtual Annotators for Time-series Physical Sensing Data [5.092345761847645]
We study whether the state-of-the-art (SOTA) LLMs can be used as virtual annotators for labeling time-series physical sensing data.
arXiv Detail & Related papers (2024-03-02T08:29:08Z) - MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in
3D World [55.878173953175356]
We propose MultiPLY, a multisensory embodied large language model.
We first collect Multisensory Universe, a large-scale multisensory interaction dataset comprising 500k data.
We demonstrate that MultiPLY outperforms baselines by a large margin through a diverse set of embodied tasks.
arXiv Detail & Related papers (2024-01-16T18:59:45Z) - How are Prompts Different in Terms of Sensitivity? [50.67313477651395]
We present a comprehensive prompt analysis based on the sensitivity of a function.
We use gradient-based saliency scores to empirically demonstrate how different prompts affect the relevance of input tokens to the output.
We introduce sensitivity-aware decoding which incorporates sensitivity estimation as a penalty term in the standard greedy decoding.
arXiv Detail & Related papers (2023-11-13T10:52:01Z) - MuSR: Testing the Limits of Chain-of-thought with Multistep Soft Reasoning [63.80739044622555]
We introduce MuSR, a dataset for evaluating language models on soft reasoning tasks specified in a natural language narrative.
This dataset has two crucial features. First, it is created through a novel neurosymbolic synthetic-to-natural generation algorithm.
Second, our dataset instances are free text narratives corresponding to real-world domains of reasoning.
arXiv Detail & Related papers (2023-10-24T17:59:20Z) - A Real-time Human Pose Estimation Approach for Optimal Sensor Placement
in Sensor-based Human Activity Recognition [63.26015736148707]
This paper introduces a novel methodology to resolve the issue of optimal sensor placement for Human Activity Recognition.
The derived skeleton data provides a unique strategy for identifying the optimal sensor location.
Our findings indicate that the vision-based method for sensor placement offers comparable results to the conventional deep learning approach.
arXiv Detail & Related papers (2023-07-06T10:38:14Z) - Unsupervised Statistical Feature-Guided Diffusion Model for Sensor-based Human Activity Recognition [3.2319909486685354]
A key problem holding up progress in wearable sensor-based human activity recognition is the unavailability of diverse and labeled training data.
We propose an unsupervised statistical feature-guided diffusion model specifically optimized for wearable sensor-based human activity recognition.
By conditioning the diffusion model on statistical information such as mean, standard deviation, Z-score, and skewness, we generate diverse and representative synthetic sensor data.
arXiv Detail & Related papers (2023-05-30T15:12:59Z) - Bayesian Imitation Learning for End-to-End Mobile Manipulation [80.47771322489422]
Augmenting policies with additional sensor inputs, such as RGB + depth cameras, is a straightforward approach to improving robot perception capabilities.
We show that using the Variational Information Bottleneck to regularize convolutional neural networks improves generalization to held-out domains.
We demonstrate that our method is able to help close the sim-to-real gap and successfully fuse RGB and depth modalities.
arXiv Detail & Related papers (2022-02-15T17:38:30Z) - Semantics-aware Adaptive Knowledge Distillation for Sensor-to-Vision
Action Recognition [131.6328804788164]
We propose a framework, named Semantics-aware Adaptive Knowledge Distillation Networks (SAKDN), to enhance action recognition in vision-sensor modality (videos)
The SAKDN uses multiple wearable-sensors as teacher modalities and uses RGB videos as student modality.
arXiv Detail & Related papers (2020-09-01T03:38:31Z) - Deep Soft Procrustes for Markerless Volumetric Sensor Alignment [81.13055566952221]
In this work, we improve markerless data-driven correspondence estimation to achieve more robust multi-sensor spatial alignment.
We incorporate geometric constraints in an end-to-end manner into a typical segmentation based model and bridge the intermediate dense classification task with the targeted pose estimation one.
Our model is experimentally shown to achieve similar results with marker-based methods and outperform the markerless ones, while also being robust to the pose variations of the calibration structure.
arXiv Detail & Related papers (2020-03-23T10:51:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.