Exploring the Capabilities of LLMs for IMU-based Fine-grained Human Activity Understanding
- URL: http://arxiv.org/abs/2504.02878v1
- Date: Wed, 02 Apr 2025 03:42:58 GMT
- Title: Exploring the Capabilities of LLMs for IMU-based Fine-grained Human Activity Understanding
- Authors: Lilin Xu, Kaiyuan Hou, Xiaofan Jiang,
- Abstract summary: Human activity recognition (HAR) using inertial measurement units (IMUs) leverages increasingly large language models (LLMs)<n>Our preliminary study indicates that pretrained LLMs fail catastrophically on fine-grained HAR tasks such as air-written letter recognition, achieving only near-random guessing accuracy.<n>To extend this to 3D, we designed an encoder-based pipeline that maps 3D data into 2D equivalents, preserving thetemporal information for robust letter prediction.<n>Our end-to-end pipeline achieves 78% accuracy on word recognition with up to 5 letters in mid-air writing scenarios, establishing LLMs as viable tools for
- Score: 1.1228672751176365
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human activity recognition (HAR) using inertial measurement units (IMUs) increasingly leverages large language models (LLMs), yet existing approaches focus on coarse activities like walking or running. Our preliminary study indicates that pretrained LLMs fail catastrophically on fine-grained HAR tasks such as air-written letter recognition, achieving only near-random guessing accuracy. In this work, we first bridge this gap for flat-surface writing scenarios: by fine-tuning LLMs with a self-collected dataset and few-shot learning, we achieved up to a 129x improvement on 2D data. To extend this to 3D scenarios, we designed an encoder-based pipeline that maps 3D data into 2D equivalents, preserving the spatiotemporal information for robust letter prediction. Our end-to-end pipeline achieves 78% accuracy on word recognition with up to 5 letters in mid-air writing scenarios, establishing LLMs as viable tools for fine-grained HAR.
Related papers
- Language Bottleneck Models: A Framework for Interpretable Knowledge Tracing and Beyond [55.984684518346924]
We recast Knowledge Tracing as an inverse problem: learning the minimum natural-language summary that makes past answers explainable and future answers predictable.<n>Our Language Bottleneck Model (LBM) consists of an encoder LLM that writes an interpretable knowledge summary and a frozen decoder LLM that must reconstruct and predict student responses using only that summary text.<n> Experiments on synthetic arithmetic benchmarks and the large-scale Eedi dataset show that LBMs rival the accuracy of state-of-the-art KT and direct LLM methods while requiring orders-of-magnitude fewer student trajectories.
arXiv Detail & Related papers (2025-06-20T13:21:14Z) - PiSA: A Self-Augmented Data Engine and Training Strategy for 3D Understanding with Large Models [20.256394783857676]
PiSA-Engine is a framework for generating instruction point-language datasets enriched with 3D spatial semantics.
We introduce PiSA-Bench, a comprehensive 3D benchmark covering six key aspects with detailed and diverse labels.
Experimental results demonstrate PointLLM-PiSA's state-of-the-art performance in zero-shot 3D object captioning and generative classification.
arXiv Detail & Related papers (2025-03-13T16:37:26Z) - Large Language Models for Single-Step and Multi-Step Flight Trajectory Prediction [5.666505394825739]
This study pioneers the use of large language models (LLMs) for flight trajectory prediction by reframing it as a language modeling problem.<n>Specifically, We features extract the aircraft's status and from ADS-B flight data to construct a prompt-based dataset.<n>The dataset is then employed to finetune LLMs, enabling them to learn complextemporal patterns for accurate predictions.
arXiv Detail & Related papers (2025-01-29T07:35:56Z) - Language Driven Occupancy Prediction [11.208411421996052]
We introduce LOcc, an effective and generalizable framework for open-vocabulary occupancy prediction.
Our pipeline presents a feasible way to dig into the valuable semantic information of images.
LOcc effectively uses the generated language ground truth to guide the learning of 3D language volume.
arXiv Detail & Related papers (2024-11-25T03:47:10Z) - LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models [62.85566496673856]
This work explores expanding the capabilities of large language models (LLMs) pretrained on text to generate 3D meshes within a unified model.
A primary challenge is effectively tokenizing 3D mesh data into discrete tokens that LLMs can process seamlessly.
Our work is the first to demonstrate that LLMs can be fine-tuned to acquire complex spatial knowledge for 3D mesh generation in a text-based format.
arXiv Detail & Related papers (2024-11-14T17:08:23Z) - Chain of Stance: Stance Detection with Large Language Models [3.528201746844624]
Stance detection is an active task in natural language processing (NLP)
We propose a new prompting method, called textitChain of Stance (CoS)
arXiv Detail & Related papers (2024-08-03T16:30:51Z) - SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning [70.21358720599821]
Large language models (LLMs) hold the promise of solving diverse tasks when provided with appropriate natural language prompts.
We propose SELF-GUIDE, a multi-stage mechanism in which we synthesize task-specific input-output pairs from the student LLM.
We report an absolute improvement of approximately 15% for classification tasks and 18% for generation tasks in the benchmark's metrics.
arXiv Detail & Related papers (2024-07-16T04:41:58Z) - VP-LLM: Text-Driven 3D Volume Completion with Large Language Models through Patchification [56.211321810408194]
Large language models (LLMs) have shown great potential in multi-modal understanding and generation tasks.
We present Volume Patch LLM (VP-LLM), which leverages LLMs to perform conditional 3D completion in a single-forward pass.
Our results demonstrate a strong ability of LLMs to interpret complex text instructions and understand 3D objects, surpassing state-of-the-art diffusion-based 3D completion models in generation quality.
arXiv Detail & Related papers (2024-06-08T18:17:09Z) - Get my drift? Catching LLM Task Drift with Activation Deltas [55.75645403965326]
Task drift allows attackers to exfiltrate data or influence the LLM's output for other users.<n>We show that a simple linear classifier can detect drift with near-perfect ROC AUC on an out-of-distribution test set.<n>We observe that this approach generalizes surprisingly well to unseen task domains, such as prompt injections, jailbreaks, and malicious instructions.
arXiv Detail & Related papers (2024-06-02T16:53:21Z) - MLP: Motion Label Prior for Temporal Sentence Localization in Untrimmed 3D Human Motions [20.986063755422173]
We aim to locate a target moment from a 3D human motion that semantically corresponds to a text query.
To refine this, we devise two novel label-prior-knowledge training schemes.
We show that injecting label-prior knowledge into the model is crucial for improving performance sequences at high IoU.
arXiv Detail & Related papers (2024-04-21T13:25:46Z) - Search-based Optimisation of LLM Learning Shots for Story Point
Estimation [3.5365325264937897]
We use Search-Based methods to optimise the number and combination of examples that can improve an LLM's estimation performance.
Our preliminary results show that our SBSE technique improves the estimation performance of the LLM by 59.34% on average.
arXiv Detail & Related papers (2024-03-13T11:29:37Z) - MARIO: MAth Reasoning with code Interpreter Output -- A Reproducible
Pipeline [12.186691561822256]
We postulate that the inherent nature of large language models (LLMs) presents challenges in modeling mathematical reasoning.
This paper introduces a novel math dataset, enhanced with a capability to utilize a Python code interpreter.
We propose a tentative, easily replicable protocol for the fine-tuning of math-specific LLMs.
arXiv Detail & Related papers (2024-01-16T08:08:01Z) - PointLLM: Empowering Large Language Models to Understand Point Clouds [63.39876878899682]
PointLLM understands colored object point clouds with human instructions.
It generates contextually appropriate responses, illustrating its grasp of point clouds and common sense.
arXiv Detail & Related papers (2023-08-31T17:59:46Z) - Open-Set Semi-Supervised Learning for 3D Point Cloud Understanding [62.17020485045456]
It is commonly assumed in semi-supervised learning (SSL) that the unlabeled data are drawn from the same distribution as that of the labeled ones.
We propose to selectively utilize unlabeled data through sample weighting, so that only conducive unlabeled data would be prioritized.
arXiv Detail & Related papers (2022-05-02T16:09:17Z) - PointContrast: Unsupervised Pre-training for 3D Point Cloud
Understanding [107.02479689909164]
In this work, we aim at facilitating research on 3D representation learning.
We measure the effect of unsupervised pre-training on a large source set of 3D scenes.
arXiv Detail & Related papers (2020-07-21T17:59:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.