Related papers: SensorBench: Benchmarking LLMs in Coding-Based Sensor Processing

SensorBench: Benchmarking LLMs in Coding-Based Sensor Processing

URL: http://arxiv.org/abs/2410.10741v2
Date: Fri, 18 Oct 2024 23:29:49 GMT
Title: SensorBench: Benchmarking LLMs in Coding-Based Sensor Processing
Authors: Pengrui Quan, Xiaomin Ouyang, Jeya Vikranth Jeyakumar, Ziqi Wang, Yang Xing, Mani Srivastava,
Abstract summary: Large Language Models (LLMs) have promising capabilities in processing sensory data, suggesting their potential as copilots for developing sensing systems. We construct a comprehensive benchmark, SensorBench, to establish a quantifiable objective. The results show that while LLMs exhibit considerable proficiency in simpler tasks, they face inherent challenges in processing compositional tasks.
Score: 6.8009140511761546
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Effective processing, interpretation, and management of sensor data have emerged as a critical component of cyber-physical systems. Traditionally, processing sensor data requires profound theoretical knowledge and proficiency in signal-processing tools. However, recent works show that Large Language Models (LLMs) have promising capabilities in processing sensory data, suggesting their potential as copilots for developing sensing systems. To explore this potential, we construct a comprehensive benchmark, SensorBench, to establish a quantifiable objective. The benchmark incorporates diverse real-world sensor datasets for various tasks. The results show that while LLMs exhibit considerable proficiency in simpler tasks, they face inherent challenges in processing compositional tasks with parameter selections compared to engineering experts. Additionally, we investigate four prompting strategies for sensor processing and show that self-verification can outperform all other baselines in 48% of tasks. Our study provides a comprehensive benchmark and prompting analysis for future developments, paving the way toward an LLM-based sensor processing copilot.

Related papers

Gensors: Authoring Personalized Visual Sensors with Multimodal Foundation Models and Reasoning [61.17099595835263]
Gensors is a system that empowers users to define customized sensors supported by the reasoning capabilities of MLLMs. In a user study, participants reported significantly greater sense of control, understanding, and ease of communication when defining sensors using Gensors.
arXiv Detail & Related papers (2025-01-27T01:47:57Z)
Are Vision-Language Models Truly Understanding Multi-vision Sensor? [38.70868031001611]
Large-scale Vision-Language Models (VLMs) have advanced by aligning vision inputs with text. For real-world applications, an understanding of diverse multi-vision sensor data, such as thermal, depth, and X-ray information, is essential.
arXiv Detail & Related papers (2024-12-30T06:44:25Z)
MSSIDD: A Benchmark for Multi-Sensor Denoising [55.41612200877861]
We introduce a new benchmark, the Multi-Sensor SIDD dataset, which is the first raw-domain dataset designed to evaluate the sensor transferability of denoising models. We propose a sensor consistency training framework that enables denoising models to learn the sensor-invariant features.
arXiv Detail & Related papers (2024-11-18T13:32:59Z)
SensorLLM: Aligning Large Language Models with Motion Sensors for Human Activity Recognition [9.072495000412943]
We bridge the gap between wearable sensor technology and personalized AI assistants by enabling Large Language Models (LLMs) to understand time-series tasks like human activity recognition (HAR) We introduce SensorLLM, a two-stage framework to unlock LLMs' potential for sensor data tasks. We show that SensorLLM evolves into an effective sensor learner, reasoner, and learner, enabling it to generalize across diverse datasets for HAR tasks.
arXiv Detail & Related papers (2024-10-14T15:30:41Z)
DSBench: How Far Are Data Science Agents to Becoming Data Science Experts? [58.330879414174476]
We introduce DSBench, a benchmark designed to evaluate data science agents with realistic tasks. This benchmark includes 466 data analysis tasks and 74 data modeling tasks, sourced from Eloquence and Kaggle competitions. Our evaluation of state-of-the-art LLMs, LVLMs, and agents shows that they struggle with most tasks, with the best agent solving only 34.12% of data analysis tasks and achieving a 34.74% Relative Performance Gap (RPG)
arXiv Detail & Related papers (2024-09-12T02:08:00Z)
LLMSense: Harnessing LLMs for High-level Reasoning Over Spatiotemporal Sensor Traces [1.1137304094345333]
We design an effective prompting framework for Large Language Models (LLMs) on high-level reasoning tasks. We also design two strategies to enhance performance with long sensor traces, including summarization before reasoning and selective inclusion of historical traces. Our framework can be implemented in an edge-cloud setup, running small LLMs on the edge for data summarization and performing high-level reasoning on the cloud for privacy preservation.
arXiv Detail & Related papers (2024-03-28T22:06:04Z)
A Plug-in Tiny AI Module for Intelligent and Selective Sensor Data Transmission [10.174575604689391]
We propose a novel sensing module to equip sensing frameworks with intelligent data transmission capabilities. We integrate a highly efficient machine learning model placed near the sensor. This model provides prompt feedback for the sensing system to transmit only valuable data while discarding irrelevant information.
arXiv Detail & Related papers (2024-02-03T05:41:39Z)
Design Space Exploration on Efficient and Accurate Human Pose Estimation from Sparse IMU-Sensing [0.04594153909580514]
Human Pose Estimation (HPE) to assess human motion in sports, rehabilitation or work safety requires accurate sensing without compromising personal data. Central trade-off between accuracy and efficient use of hardware resources is rarely discussed in research. We generate IMU-data from a publicly available body model dataset for different sensor configurations and train a deep learning model with this data.
arXiv Detail & Related papers (2023-07-21T13:36:49Z)
Datasheets for Machine Learning Sensors: Towards Transparency, Auditability, and Responsibility for Intelligent Sensing [9.686781507805113]
Machine learning (ML) sensors are enabling intelligence at the edge by empowering end-users with greater control over their data. We introduce a standard template for these ML sensors and discuss and evaluate the design and motivation for each section of the dasheet. To provide a case study of the application of our template, we also designed and developed two examples for ML sensors performing computer vision-based person detection.
arXiv Detail & Related papers (2023-06-15T04:24:13Z)
On the Importance of Accurate Geometry Data for Dense 3D Vision Tasks [61.74608497496841]
Training on inaccurate or corrupt data induces model bias and hampers generalisation capabilities. This paper investigates the effect of sensor errors for the dense 3D vision tasks of depth estimation and reconstruction.
arXiv Detail & Related papers (2023-03-26T22:32:44Z)
Bayesian Imitation Learning for End-to-End Mobile Manipulation [80.47771322489422]
Augmenting policies with additional sensor inputs, such as RGB + depth cameras, is a straightforward approach to improving robot perception capabilities. We show that using the Variational Information Bottleneck to regularize convolutional neural networks improves generalization to held-out domains. We demonstrate that our method is able to help close the sim-to-real gap and successfully fuse RGB and depth modalities.
arXiv Detail & Related papers (2022-02-15T17:38:30Z)
DeepTimeAnomalyViz: A Tool for Visualizing and Post-processing Deep Learning Anomaly Detection Results for Industrial Time-Series [88.12892448747291]
We introduce the DeTAVIZ interface, which is a web browser based visualization tool for quick exploration and assessment of feasibility of DL based anomaly detection in a given problem. DeTAVIZ allows the user to easily and quickly iterate through multiple post processing options and compare different models, and allows for manual optimisation towards a chosen metric.
arXiv Detail & Related papers (2021-09-21T10:38:26Z)
Benchmarking high-fidelity pedestrian tracking systems for research, real-time monitoring and crowd control [55.41644538483948]
High-fidelity pedestrian tracking in real-life conditions has been an important tool in fundamental crowd dynamics research. As this technology advances, it is becoming increasingly useful also in society. To successfully employ pedestrian tracking techniques in research and technology, it is crucial to validate and benchmark them for accuracy. We present and discuss a benchmark suite, towards an open standard in the community, for privacy-respectful pedestrian tracking techniques.
arXiv Detail & Related papers (2021-08-26T11:45:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.