Datasheets for Machine Learning Sensors
- URL: http://arxiv.org/abs/2306.08848v4
- Date: Tue, 28 Oct 2025 17:53:16 GMT
- Title: Datasheets for Machine Learning Sensors
- Authors: Matthew Stewart, Yuke Zhang, Pete Warden, Yasmine Omri, Shvetank Prakash, Jacob Huckelberry, Joao Henrique Santos, Shawn Hymel, Benjamin Yeager Brown, Jim MacArthur, Nat Jeffries, Emanuel Moss, Mona Sloane, Brian Plancher, Vijay Janapa Reddi,
- Abstract summary: Machine learning (ML) is becoming prevalent in embedded AI sensing systems.<n>These "ML sensors" enable context-sensitive, real-time data collection and decision-making.<n>There is a need to provide transparency in the operation of such ML-enabled sensing systems.
- Score: 11.73392532310473
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine learning (ML) is becoming prevalent in embedded AI sensing systems. These "ML sensors" enable context-sensitive, real-time data collection and decision-making across diverse applications ranging from anomaly detection in industrial settings to wildlife tracking for conservation efforts. As such, there is a need to provide transparency in the operation of such ML-enabled sensing systems through comprehensive documentation. This is needed to enable their reproducibility, to address new compliance and auditing regimes mandated in regulation and industry-specific policy, and to verify and validate the responsible nature of their operation. To address this gap, we introduce the datasheet for ML sensors framework. We provide a comprehensive template, collaboratively developed in academia-industry partnerships, that captures the distinct attributes of ML sensors, including hardware specifications, ML model and dataset characteristics, end-to-end performance metrics, and environmental impacts. Our framework addresses the continuous streaming nature of sensor data, real-time processing requirements, and embeds benchmarking methodologies that reflect real-world deployment conditions, ensuring practical viability. Aligned with the FAIR principles (Findability, Accessibility, Interoperability, and Reusability), our approach enhances the transparency and reusability of ML sensor documentation across academic, industrial, and regulatory domains. To show the application of our approach, we present two datasheets: the first for an open-source ML sensor designed in-house and the second for a commercial ML sensor developed by industry collaborators, both performing computer vision-based person detection.
Related papers
- DomusFM: A Foundation Model for Smart-Home Sensor Data [11.28458211143065]
We introduce DomusFM, the first foundation model specifically designed and pretrained for smart-home sensor data.<n>DomusFM employs a self-supervised dual contrastive learning paradigm to capture both token-level semantic attributes and sequence-level temporal dependencies.<n>Our approach addresses data scarcity while maintaining practical deployability for real-world smart-home systems.
arXiv Detail & Related papers (2026-02-02T10:16:34Z) - Towards Agentic Intelligence for Materials Science [73.4576385477731]
This survey advances a unique pipeline-centric view that spans from corpus curation and pretraining to goal-conditioned agents interfacing with simulation and experimental platforms.<n>To bridge communities and establish a shared frame of reference, we first present an integrated lens that aligns terminology, evaluation, and workflow stages across AI and materials science.
arXiv Detail & Related papers (2026-01-29T23:48:43Z) - Can LLMs Clean Up Your Mess? A Survey of Application-Ready Data Preparation with LLMs [66.63911043019294]
Data preparation aims to denoise raw datasets, uncover cross-dataset relationships, and extract valuable insights from them.<n>This paper focuses on the use of LLM techniques to prepare data for diverse downstream tasks.<n>We introduce a task-centric taxonomy that organizes the field into three major tasks: data cleaning, standardization, error processing, imputation, data integration, and data enrichment.
arXiv Detail & Related papers (2026-01-22T12:02:45Z) - From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence [150.3696990310269]
Large language models (LLMs) have transformed automated software development by enabling direct translation of natural language descriptions into functional code.<n>We provide a comprehensive synthesis and practical guide (a series of analytic and probing experiments) about code LLMs.<n>We analyze the code capability of the general LLMs (GPT-4, Claude, LLaMA) and code-specialized LLMs (StarCoder, Code LLaMA, DeepSeek-Coder, and QwenCoder)
arXiv Detail & Related papers (2025-11-23T17:09:34Z) - Graft: Integrating the Domain Knowledge via Efficient Parameter Synergy for MLLMs [56.76586846269894]
Multimodal Large Language Models (MLLMs) have achieved success across various domains.<n>Despite its importance, the study of knowledge sharing among domain-specific MLLMs remains largely underexplored.<n>We propose a unified parameter integration framework that enables modular composition of expert capabilities.
arXiv Detail & Related papers (2025-06-30T15:07:41Z) - SensorLM: Learning the Language of Wearable Sensors [50.95988682423808]
We present SensorLM, a family of sensor-language foundation models that enable wearable sensor data understanding with natural language.<n>We introduce a hierarchical caption generation pipeline designed to capture statistical, structural, and semantic information from sensor data.<n>This approach enabled the curation of the largest sensor-language dataset to date, comprising over 59.7 million hours of data from more than 103,000 people.
arXiv Detail & Related papers (2025-06-10T17:13:09Z) - Gensors: Authoring Personalized Visual Sensors with Multimodal Foundation Models and Reasoning [61.17099595835263]
Gensors is a system that empowers users to define customized sensors supported by the reasoning capabilities of MLLMs.
In a user study, participants reported significantly greater sense of control, understanding, and ease of communication when defining sensors using Gensors.
arXiv Detail & Related papers (2025-01-27T01:47:57Z) - Towards Human-Guided, Data-Centric LLM Co-Pilots [53.35493881390917]
CliMB-DC is a human-guided, data-centric framework for machine learning co-pilots.
It combines advanced data-centric tools with LLM-driven reasoning to enable robust, context-aware data processing.
We show how CliMB-DC can transform uncurated datasets into ML-ready formats.
arXiv Detail & Related papers (2025-01-17T17:51:22Z) - A Soft Sensor Method with Uncertainty-Awareness and Self-Explanation Based on Large Language Models Enhanced by Domain Knowledge Retrieval [17.605817344542345]
We propose a framework called Few-shot Uncertainty-aware and self-Explaining Soft Sensor (LLM-FUESS)<n>LLM-FUESS includes the Zero-shot Auxiliary Variable Selector (LLM-ZAVS) and the Uncertainty-aware Few-shot Soft Sensor (LLM-UFSS)<n>Our method achieved state-of-the-art predictive performance, strong robustness, and flexibility, effectively mitigates training instability found in traditional methods.
arXiv Detail & Related papers (2025-01-06T11:43:29Z) - SensorBench: Benchmarking LLMs in Coding-Based Sensor Processing [6.8009140511761546]
Large Language Models (LLMs) have promising capabilities in processing sensory data, suggesting their potential as copilots for developing sensing systems.
We construct a comprehensive benchmark, SensorBench, to establish a quantifiable objective.
The results show that while LLMs exhibit considerable proficiency in simpler tasks, they face inherent challenges in processing compositional tasks.
arXiv Detail & Related papers (2024-10-14T17:21:39Z) - SensorLLM: Aligning Large Language Models with Motion Sensors for Human Activity Recognition [9.072495000412943]
We bridge the gap between wearable sensor technology and personalized AI assistants by enabling Large Language Models (LLMs) to understand time-series tasks like human activity recognition (HAR)
We introduce SensorLLM, a two-stage framework to unlock LLMs' potential for sensor data tasks.
We show that SensorLLM evolves into an effective sensor learner, reasoner, and learner, enabling it to generalize across diverse datasets for HAR tasks.
arXiv Detail & Related papers (2024-10-14T15:30:41Z) - LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models [55.903148392998965]
We introduce LOKI, a novel benchmark designed to evaluate the ability of LMMs to detect synthetic data across multiple modalities.
The benchmark includes coarse-grained judgment and multiple-choice questions, as well as fine-grained anomaly selection and explanation tasks.
We evaluate 22 open-source LMMs and 6 closed-source models on LOKI, highlighting their potential as synthetic data detectors and also revealing some limitations in the development of LMM capabilities.
arXiv Detail & Related papers (2024-10-13T05:26:36Z) - Large Language Model-Guided Semantic Alignment for Human Activity Recognition [14.934473748133422]
Human Activity Recognition (HAR) using Inertial Measurement Unit (IMU) sensors is critical for applications in healthcare, safety, and industrial production.<n> variation in activity patterns, device types, and sensor placements create distribution gaps across datasets.<n>We propose LanHAR, a novel system that generates semantic interpretations of sensor readings and activity labels for cross-dataset HAR.
arXiv Detail & Related papers (2024-09-12T22:57:29Z) - LEARN: Knowledge Adaptation from Large Language Model to Recommendation for Practical Industrial Application [54.984348122105516]
Llm-driven knowlEdge Adaptive RecommeNdation (LEARN) framework synergizes open-world knowledge with collaborative knowledge.<n>We propose an Llm-driven knowlEdge Adaptive RecommeNdation (LEARN) framework that synergizes open-world knowledge with collaborative knowledge.
arXiv Detail & Related papers (2024-05-07T04:00:30Z) - A Plug-in Tiny AI Module for Intelligent and Selective Sensor Data
Transmission [10.174575604689391]
We propose a novel sensing module to equip sensing frameworks with intelligent data transmission capabilities.
We integrate a highly efficient machine learning model placed near the sensor.
This model provides prompt feedback for the sensing system to transmit only valuable data while discarding irrelevant information.
arXiv Detail & Related papers (2024-02-03T05:41:39Z) - Federated Learning on Edge Sensing Devices: A Review [0.0]
Federated Learning (FL) is emerging as a solution to privacy, hardware, and connectivity limitations.
We focus on the key FL principles, software frameworks, and testbeds.
We also explore the current sensor technologies, properties of the sensing devices and sensing applications where FL is utilized.
arXiv Detail & Related papers (2023-11-02T12:55:26Z) - Vulnerability of Machine Learning Approaches Applied in IoT-based Smart Grid: A Review [51.31851488650698]
Machine learning (ML) sees an increasing prevalence of being used in the internet-of-things (IoT)-based smart grid.
adversarial distortion injected into the power signal will greatly affect the system's normal control and operation.
It is imperative to conduct vulnerability assessment for MLsgAPPs applied in the context of safety-critical power systems.
arXiv Detail & Related papers (2023-08-30T03:29:26Z) - Low-cost Efficient Wireless Intelligent Sensor (LEWIS) for Engineering,
Research, and Education [72.2614468437919]
The vision of smart cities equipped with sensors informing decisions has not been realized to date.
Civil engineers lack of knowledge in sensor technology.
The electrical components and computer knowledge associated with sensors are still a challenge for civil engineers.
arXiv Detail & Related papers (2023-03-23T21:49:26Z) - Semantic Information Marketing in The Metaverse: A Learning-Based
Contract Theory Framework [68.8725783112254]
We address the problem of designing incentive mechanisms by a virtual service provider (VSP) to hire sensing IoT devices to sell their sensing data.
Due to the limited bandwidth, we propose to use semantic extraction algorithms to reduce the delivered data by the sensing IoT devices.
We propose a novel iterative contract design and use a new variant of multi-agent reinforcement learning (MARL) to solve the modelled multi-dimensional contract problem.
arXiv Detail & Related papers (2023-02-22T15:52:37Z) - SECOE: Alleviating Sensors Failure in Machine Learning-Coupled IoT
Systems [0.0]
This paper proposes SECOE, a proactive approach for alleviating potentially simultaneous sensor failures.
SECOE includes a novel technique to minimize the number of models in the ensemble by harnessing the correlations among sensors.
Experiments reveal that SECOE effectively preserves prediction accuracy in the presence of sensor failures.
arXiv Detail & Related papers (2022-10-05T10:58:39Z) - Machine Learning Sensors [4.263101392970408]
Machine learning sensors represent a paradigm shift for the future of embedded machine learning applications.
Current instantiations of embedded machine learning (ML) suffer from complex integration, lack of modularity, and privacy and security concerns.
This article proposes a more data-centric paradigm for embedding sensor intelligence on edge devices to combat these challenges.
arXiv Detail & Related papers (2022-06-07T13:22:13Z) - Bayesian Imitation Learning for End-to-End Mobile Manipulation [80.47771322489422]
Augmenting policies with additional sensor inputs, such as RGB + depth cameras, is a straightforward approach to improving robot perception capabilities.
We show that using the Variational Information Bottleneck to regularize convolutional neural networks improves generalization to held-out domains.
We demonstrate that our method is able to help close the sim-to-real gap and successfully fuse RGB and depth modalities.
arXiv Detail & Related papers (2022-02-15T17:38:30Z) - SensiX++: Bringing MLOPs and Multi-tenant Model Serving to Sensory Edge
Devices [69.1412199244903]
We present a multi-tenant runtime for adaptive model execution with integrated MLOps on edge devices, e.g., a camera, a microphone, or IoT sensors.
S SensiX++ operates on two fundamental principles - highly modular componentisation to externalise data operations with clear abstractions and document-centric manifestation for system-wide orchestration.
We report on the overall throughput and quantified benefits of various automation components of SensiX++ and demonstrate its efficacy to significantly reduce operational complexity and lower the effort to deploy, upgrade, reconfigure and serve embedded models on edge devices.
arXiv Detail & Related papers (2021-09-08T22:06:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.