Related papers: Enhancing Predictive Maintenance in Mining Mobile Machinery through a TinyML-enabled Hierarchical Inference Network

Related papers

TS-Memory: Plug-and-Play Memory for Time Series Foundation Models [63.21390142212087]
Time Series Foundation Models (TSFMs) achieve strong zero-shot forecasting through large-scale pre-training.<n>Existing solutions face a trade-off: Parametric Adaptation can cause catastrophic forgetting, while Non-Parametric Retrieval improves forecasts but incurs high latency due to datastore search.<n>We propose Parametric Memory Distillation and implement it as TS-Memory, a lightweight memory adapter that augments frozen TSFMs.
arXiv Detail & Related papers (2026-02-12T04:16:19Z)
LiQSS: Post-Transformer Linear Quantum-Inspired State-Space Tensor Networks for Real-Time 6G [85.58816960936069]
Proactive and agentic control in Sixth-Generation (6G) Open Radio Access Networks (O-RAN) requires control-grade prediction under stringent Near-Time (Near-RT) latency and computational constraints.<n>This paper investigates a post-Transformer paradigm for efficient radio telemetry forecasting.<n>We propose a quantum-inspired state-space tensor network that replaces self-attention with stable structured state-space dynamics kernels.
arXiv Detail & Related papers (2026-01-18T12:08:38Z)
STAR: A Privacy-Preserving, Energy-Efficient Edge AI Framework for Human Activity Recognition via Wi-Fi CSI in Mobile and Pervasive Computing Environments [0.0]
Human Activity Recognition via Wi-Fi Channel State Information (CSI) presents a privacy-preserving, contactless sensing approach suitable for smart homes, healthcare monitoring, and mobile IoT systems.<n>This paper proposes STAR (Sensing Technology for Activity Recognition), an edge-AI-optimized framework that integrates a lightweight neural architecture, adaptive signal processing, and hardware-aware co-optimization.<n>With sub-second response latency and low power consumption, the system ensures real-time, privacy-preserving HAR, offering a practical, scalable solution for mobile and pervasive computing environments.
arXiv Detail & Related papers (2025-10-30T05:08:25Z)
Evaluating the Energy Efficiency of NPU-Accelerated Machine Learning Inference on Embedded Microcontrollers [0.0]
This paper evaluates the impact of Neural Processing Units (NPUs) on machine learning (ML) execution on microcontrollers (MCUs)<n>It shows substantial efficiency gains when inference is offloaded to the NPU.<n>For moderate to large networks, latency improvements ranged from 7x to over 125x, with per-inference net energy reductions up to 143x.
arXiv Detail & Related papers (2025-09-22T08:52:54Z)
Automated Energy-Aware Time-Series Model Deployment on Embedded FPGAs for Resilient Combined Sewer Overflow Management [17.903318666906728]
Extreme weather events, intensified by climate change, increasingly challenge aging combined sewer systems.<n>Forecasting of sewer overflow basin filling levels can provide actionable insights for early intervention.<n>We propose an end-to-end forecasting framework that enables energy-efficient inference directly on edge devices.
arXiv Detail & Related papers (2025-08-19T15:06:04Z)
CSGO: Generalized Optimization for Cold Start in Wireless Collaborative Edge LLM Systems [62.24576366776727]
We propose a latency-aware scheduling framework to minimize total inference latency.<n>We show that the proposed method significantly reduces cold-start latency compared to baseline strategies.
arXiv Detail & Related papers (2025-08-15T07:49:22Z)
Benchmarking Energy and Latency in TinyML: A Novel Method for Resource-Constrained AI [0.0]
This work introduces an alternative benchmarking methodology that integrates energy and latency measurements.<n>To evaluate our setup, we tested the STM32N6 MCU, which includes a NPU for executing neural networks.<n>Our findings demonstrate that reducing the core voltage and clock frequency improve the efficiency of pre- and post-processing.
arXiv Detail & Related papers (2025-05-21T15:12:14Z)
The Larger the Merrier? Efficient Large AI Model Inference in Wireless Edge Networks [56.37880529653111]
The demand for large computation model (LAIM) services is driving a paradigm shift from traditional cloud-based inference to edge-based inference for low-latency, privacy-preserving applications.<n>In this paper, we investigate the LAIM-inference scheme, where a pre-trained LAIM is pruned and partitioned into on-device and on-server sub-models for deployment.
arXiv Detail & Related papers (2025-05-14T08:18:55Z)
Reservoir Network with Structural Plasticity for Human Activity Recognition [2.355460994057843]
Echo state network (ESN) is a class of recurrent neural networks that can be used to identify unique patterns in time-series data and predict future events. In this work, a custom-design neuromorphic chip based on ESN targeting edge devices is proposed. The proposed system supports various learning mechanisms, including structural plasticity and synaptic plasticity, locally on-chip.
arXiv Detail & Related papers (2025-03-01T07:57:22Z)
Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge. Existing methods struggle to balance high model performance with low resource consumption. We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z)
DSORT-MCU: Detecting Small Objects in Real-Time on Microcontroller Units [1.4447019135112429]
This paper proposes an adaptive tiling method for lightweight and energy-efficient object detection networks, including YOLO-based models and the popular FOMO network. The proposed tiling enables object detection on low-power MCUs with no compromise on accuracy compared to large-scale detection models.
arXiv Detail & Related papers (2024-10-22T07:37:47Z)
Exploring the Boundaries of On-Device Inference: When Tiny Falls Short, Go Hierarchical [4.211747495359569]
Hierarchical Inference (HI) system offloads selected samples to an edge server or cloud for remote ML inference. This paper systematically compares the performance of HI with on-device inference based on measurements of accuracy, latency, and energy.
arXiv Detail & Related papers (2024-07-10T16:05:43Z)
Introducing a Deep Neural Network-based Model Predictive Control Framework for Rapid Controller Implementation [41.38091115195305]
This work presents the experimental implementation of a deep neural network (DNN) based nonlinear MPC for Homogeneous Charge Compression Ignition (HCCI) combustion control. Using the acados software package to enable the real-time implementation of the MPC on an ARM Cortex A72, the optimization calculations are completed within 1.4 ms. The IMEP trajectory following of the developed controller was excellent, with a root-mean-square error of 0.133 bar, in addition to observing process constraints.
arXiv Detail & Related papers (2023-10-12T15:03:50Z)
DUET: A Tuning-Free Device-Cloud Collaborative Parameters Generation Framework for Efficient Device Model Generalization [66.27399823422665]
Device Model Generalization (DMG) is a practical yet under-investigated research topic for on-device machine learning applications. We propose an efficient Device-cloUd collaborative parametErs generaTion framework DUET.
arXiv Detail & Related papers (2022-09-12T13:26:26Z)
Evaluating Short-Term Forecasting of Multiple Time Series in IoT Environments [67.24598072875744]
Internet of Things (IoT) environments are monitored via a large number of IoT enabled sensing devices. To alleviate this issue, sensors are often configured to operate at relatively low sampling frequencies. This can hamper dramatically subsequent decision-making, such as forecasting.
arXiv Detail & Related papers (2022-06-15T19:46:59Z)
Energy-Efficient Wake-Up Signalling for Machine-Type Devices Based on Traffic-Aware Long-Short Term Memory Prediction [10.51090547010728]
Wake-up Signal (WuS) technology aims to minimize the energy consumed by the radio interface of machine-type devices (MTDs) We design a simple but efficient neural network to predict MTC traffic patterns and configure WuS accordingly. In terms of energy consumption reduction, FWuS can outperform the best benchmark mechanism in up to 32%.
arXiv Detail & Related papers (2022-06-13T11:42:22Z)
MAPLE-Edge: A Runtime Latency Predictor for Edge Devices [80.01591186546793]
We propose MAPLE-Edge, an edge device-oriented extension of MAPLE, the state-of-the-art latency predictor for general purpose hardware. Compared to MAPLE, MAPLE-Edge can describe the runtime and target device platform using a much smaller set of CPU performance counters. We also demonstrate that unlike MAPLE which performs best when trained on a pool of devices sharing a common runtime, MAPLE-Edge can effectively generalize across runtimes.
arXiv Detail & Related papers (2022-04-27T14:00:48Z)
Energy-efficient and Privacy-aware Social Distance Monitoring with Low-resolution Infrared Sensors and Adaptive Inference [4.158182639870093]
Low-resolution infrared (IR) sensors can be leveraged to implement privacy-preserving social distance monitoring solutions in indoor spaces. We propose an energy-efficient adaptive inference solution consisting of a cascade of a simple wake-up trigger and a 8-bit quantized Convolutional Neural Network (CNN) We show that, when processing the output of a 8x8 low-resolution IR sensor, we are able to reduce the energy consumption by 37-57% with respect to a static CNN-based approach.
arXiv Detail & Related papers (2022-04-22T07:07:38Z)
An Adaptive Device-Edge Co-Inference Framework Based on Soft Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices. We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations. Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z)
EdgeBERT: Sentence-Level Energy Optimizations for Latency-Aware Multi-Task NLP Inference [82.1584439276834]
Transformer-based language models such as BERT provide significant accuracy improvement for a multitude of natural language processing (NLP) tasks. We present EdgeBERT, an in-depth algorithm- hardware co-design for latency-aware energy optimization for multi-task NLP.
arXiv Detail & Related papers (2020-11-28T19:21:47Z)
A Privacy-Preserving-Oriented DNN Pruning and Mobile Acceleration Framework [56.57225686288006]
Weight pruning of deep neural networks (DNNs) has been proposed to satisfy the limited storage and computing capability of mobile edge devices. Previous pruning methods mainly focus on reducing the model size and/or improving performance without considering the privacy of user data. We propose a privacy-preserving-oriented pruning and mobile acceleration framework that does not require the private training dataset.
arXiv Detail & Related papers (2020-03-13T23:52:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.