Related papers: SensorLLM: Aligning Large Language Models with Motion Sensors for Human Activity Recognition

SensorLLM: Aligning Large Language Models with Motion Sensors for Human Activity Recognition

URL: http://arxiv.org/abs/2410.10624v1
Date: Mon, 14 Oct 2024 15:30:41 GMT
Title: SensorLLM: Aligning Large Language Models with Motion Sensors for Human Activity Recognition
Authors: Zechen Li, Shohreh Deldari, Linyao Chen, Hao Xue, Flora D. Salim,
Abstract summary: We bridge the gap between wearable sensor technology and personalized AI assistants by enabling Large Language Models (LLMs) to understand time-series tasks like human activity recognition (HAR) We introduce SensorLLM, a two-stage framework to unlock LLMs' potential for sensor data tasks. We show that SensorLLM evolves into an effective sensor learner, reasoner, and learner, enabling it to generalize across diverse datasets for HAR tasks.
Score: 9.072495000412943
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this work, we bridge the gap between wearable sensor technology and personalized AI assistants by enabling Large Language Models (LLMs) to understand time-series tasks like human activity recognition (HAR). Despite the strong reasoning and generalization capabilities of LLMs, leveraging them for sensor data tasks remains largely unexplored. This gap stems from challenges like the lack of semantic context in time-series data, computational limitations, and LLMs' difficulty processing numerical inputs. To address these issues, we introduce SensorLLM, a two-stage framework to unlock LLMs' potential for sensor data tasks. In the Sensor-Language Alignment Stage, we introduce special tokens for each sensor channel and automatically generate trend-descriptive text to align sensor data with textual inputs, enabling SensorLLM to capture numerical changes, channel-specific information, and sensor data of varying lengths-capabilities that existing LLMs typically struggle with, all without the need for human annotations. Next, in Task-Aware Tuning Stage, we refine the model for HAR classification using the frozen LLM and alignment module, achieving performance on par with or surpassing state-of-the-art models. We further demonstrate that SensorLLM evolves into an effective sensor learner, reasoner, and classifier through Sensor-Language Alignment, enabling it to generalize across diverse datasets for HAR tasks. We strongly believe our work lays the stepstone for future time-series and text alignment research, offering a path toward foundation models for sensor data.

Related papers

SMARTIES: Spectrum-Aware Multi-Sensor Auto-Encoder for Remote Sensing Images [17.304106221330414]
Recent deep learning models are often specific to single sensors or to fixed combinations.<n>We introduce SMARTIES, a generic and versatile foundation model lifting sensor-specific/dependent efforts.<n>On both single and multi-modal tasks across diverse sensors, SMARTIES outperforms previous models that rely on sensor-specific pretraining.
arXiv Detail & Related papers (2025-06-24T12:51:39Z)
SensorLM: Learning the Language of Wearable Sensors [50.95988682423808]
We present SensorLM, a family of sensor-language foundation models that enable wearable sensor data understanding with natural language.<n>We introduce a hierarchical caption generation pipeline designed to capture statistical, structural, and semantic information from sensor data.<n>This approach enabled the curation of the largest sensor-language dataset to date, comprising over 59.7 million hours of data from more than 103,000 people.
arXiv Detail & Related papers (2025-06-10T17:13:09Z)
Gensors: Authoring Personalized Visual Sensors with Multimodal Foundation Models and Reasoning [61.17099595835263]
Gensors is a system that empowers users to define customized sensors supported by the reasoning capabilities of MLLMs. In a user study, participants reported significantly greater sense of control, understanding, and ease of communication when defining sensors using Gensors.
arXiv Detail & Related papers (2025-01-27T01:47:57Z)
MSSIDD: A Benchmark for Multi-Sensor Denoising [55.41612200877861]
We introduce a new benchmark, the Multi-Sensor SIDD dataset, which is the first raw-domain dataset designed to evaluate the sensor transferability of denoising models. We propose a sensor consistency training framework that enables denoising models to learn the sensor-invariant features.
arXiv Detail & Related papers (2024-11-18T13:32:59Z)
Scaling Wearable Foundation Models [54.93979158708164]
We investigate the scaling properties of sensor foundation models across compute, data, and model size. Using a dataset of up to 40 million hours of in-situ heart rate, heart rate variability, electrodermal activity, accelerometer, skin temperature, and altimeter per-minute data from over 165,000 people, we create LSM. Our results establish the scaling laws of LSM for tasks such as imputation, extrapolation, both across time and sensor modalities.
arXiv Detail & Related papers (2024-10-17T15:08:21Z)
SensorBench: Benchmarking LLMs in Coding-Based Sensor Processing [6.8009140511761546]
Large Language Models (LLMs) have promising capabilities in processing sensory data, suggesting their potential as copilots for developing sensing systems. We construct a comprehensive benchmark, SensorBench, to establish a quantifiable objective. The results show that while LLMs exhibit considerable proficiency in simpler tasks, they face inherent challenges in processing compositional tasks.
arXiv Detail & Related papers (2024-10-14T17:21:39Z)
Language-centered Human Activity Recognition [8.925867647929088]
Human Activity Recognition (HAR) using Inertial Measurement Unit (IMU) sensors is critical for applications in healthcare, safety, and industrial production. variation in activity patterns, device types, and sensor placements create distribution gaps across datasets. We propose LanHAR, a novel system that generates semantic interpretations of sensor readings and activity labels for cross-dataset HAR.
arXiv Detail & Related papers (2024-09-12T22:57:29Z)
By My Eyes: Grounding Multimodal Large Language Models with Sensor Data via Visual Prompting [24.39281384670957]
We propose a visual prompting approach for sensor data using multimodal large language models (MLLMs) We design a visual prompt that directs MLLMs to utilize visualized sensor data alongside the target sensory task descriptions. We evaluate our approach on nine sensory tasks involving four sensing modalities, achieving an average of 10% higher accuracy than text-based prompts.
arXiv Detail & Related papers (2024-07-15T01:33:54Z)
Layout Agnostic Human Activity Recognition in Smart Homes through Textual Descriptions Of Sensor Triggers (TDOST) [0.22354214294493352]
We develop a layout-agnostic modeling approach for human activity recognition (HAR) systems in smart homes. We generate Textual Descriptions Of Sensor Triggers (TDOST) that encapsulate the surrounding trigger conditions. We demonstrate the effectiveness of TDOST-based models in unseen smart homes through experiments on benchmarked CASAS datasets.
arXiv Detail & Related papers (2024-05-20T20:37:44Z)
LLMSense: Harnessing LLMs for High-level Reasoning Over Spatiotemporal Sensor Traces [1.1137304094345333]
We design an effective prompting framework for Large Language Models (LLMs) on high-level reasoning tasks. We also design two strategies to enhance performance with long sensor traces, including summarization before reasoning and selective inclusion of historical traces. Our framework can be implemented in an edge-cloud setup, running small LLMs on the edge for data summarization and performing high-level reasoning on the cloud for privacy preservation.
arXiv Detail & Related papers (2024-03-28T22:06:04Z)
Evaluating Large Language Models as Virtual Annotators for Time-series Physical Sensing Data [5.092345761847645]
We study whether the state-of-the-art (SOTA) LLMs can be used as virtual annotators for labeling time-series physical sensing data.
arXiv Detail & Related papers (2024-03-02T08:29:08Z)
MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World [55.878173953175356]
We propose MultiPLY, a multisensory embodied large language model. We first collect Multisensory Universe, a large-scale multisensory interaction dataset comprising 500k data. We demonstrate that MultiPLY outperforms baselines by a large margin through a diverse set of embodied tasks.
arXiv Detail & Related papers (2024-01-16T18:59:45Z)
How are Prompts Different in Terms of Sensitivity? [50.67313477651395]
We present a comprehensive prompt analysis based on the sensitivity of a function. We use gradient-based saliency scores to empirically demonstrate how different prompts affect the relevance of input tokens to the output. We introduce sensitivity-aware decoding which incorporates sensitivity estimation as a penalty term in the standard greedy decoding.
arXiv Detail & Related papers (2023-11-13T10:52:01Z)
MuSR: Testing the Limits of Chain-of-thought with Multistep Soft Reasoning [63.80739044622555]
We introduce MuSR, a dataset for evaluating language models on soft reasoning tasks specified in a natural language narrative. This dataset has two crucial features. First, it is created through a novel neurosymbolic synthetic-to-natural generation algorithm. Second, our dataset instances are free text narratives corresponding to real-world domains of reasoning.
arXiv Detail & Related papers (2023-10-24T17:59:20Z)
A Real-time Human Pose Estimation Approach for Optimal Sensor Placement in Sensor-based Human Activity Recognition [63.26015736148707]
This paper introduces a novel methodology to resolve the issue of optimal sensor placement for Human Activity Recognition. The derived skeleton data provides a unique strategy for identifying the optimal sensor location. Our findings indicate that the vision-based method for sensor placement offers comparable results to the conventional deep learning approach.
arXiv Detail & Related papers (2023-07-06T10:38:14Z)
Unsupervised Statistical Feature-Guided Diffusion Model for Sensor-based Human Activity Recognition [3.2319909486685354]
A key problem holding up progress in wearable sensor-based human activity recognition is the unavailability of diverse and labeled training data. We propose an unsupervised statistical feature-guided diffusion model specifically optimized for wearable sensor-based human activity recognition. By conditioning the diffusion model on statistical information such as mean, standard deviation, Z-score, and skewness, we generate diverse and representative synthetic sensor data.
arXiv Detail & Related papers (2023-05-30T15:12:59Z)
Bayesian Imitation Learning for End-to-End Mobile Manipulation [80.47771322489422]
Augmenting policies with additional sensor inputs, such as RGB + depth cameras, is a straightforward approach to improving robot perception capabilities. We show that using the Variational Information Bottleneck to regularize convolutional neural networks improves generalization to held-out domains. We demonstrate that our method is able to help close the sim-to-real gap and successfully fuse RGB and depth modalities.
arXiv Detail & Related papers (2022-02-15T17:38:30Z)
Semantics-aware Adaptive Knowledge Distillation for Sensor-to-Vision Action Recognition [131.6328804788164]
We propose a framework, named Semantics-aware Adaptive Knowledge Distillation Networks (SAKDN), to enhance action recognition in vision-sensor modality (videos) The SAKDN uses multiple wearable-sensors as teacher modalities and uses RGB videos as student modality.
arXiv Detail & Related papers (2020-09-01T03:38:31Z)
Deep Soft Procrustes for Markerless Volumetric Sensor Alignment [81.13055566952221]
In this work, we improve markerless data-driven correspondence estimation to achieve more robust multi-sensor spatial alignment. We incorporate geometric constraints in an end-to-end manner into a typical segmentation based model and bridge the intermediate dense classification task with the targeted pose estimation one. Our model is experimentally shown to achieve similar results with marker-based methods and outperform the markerless ones, while also being robust to the pose variations of the calibration structure.
arXiv Detail & Related papers (2020-03-23T10:51:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.