Automated Energy-Aware Time-Series Model Deployment on Embedded FPGAs for Resilient Combined Sewer Overflow Management
- URL: http://arxiv.org/abs/2508.13905v1
- Date: Tue, 19 Aug 2025 15:06:04 GMT
- Title: Automated Energy-Aware Time-Series Model Deployment on Embedded FPGAs for Resilient Combined Sewer Overflow Management
- Authors: Tianheng Ling, Vipin Singh, Chao Qian, Felix Biessmann, Gregor Schiele,
- Abstract summary: Extreme weather events, intensified by climate change, increasingly challenge aging combined sewer systems.<n>Forecasting of sewer overflow basin filling levels can provide actionable insights for early intervention.<n>We propose an end-to-end forecasting framework that enables energy-efficient inference directly on edge devices.
- Score: 17.903318666906728
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Extreme weather events, intensified by climate change, increasingly challenge aging combined sewer systems, raising the risk of untreated wastewater overflow. Accurate forecasting of sewer overflow basin filling levels can provide actionable insights for early intervention, helping mitigating uncontrolled discharge. In recent years, AI-based forecasting methods have offered scalable alternatives to traditional physics-based models, but their reliance on cloud computing limits their reliability during communication outages. To address this, we propose an end-to-end forecasting framework that enables energy-efficient inference directly on edge devices. Our solution integrates lightweight Transformer and Long Short-Term Memory (LSTM) models, compressed via integer-only quantization for efficient on-device execution. Moreover, an automated hardware-aware deployment pipeline is used to search for optimal model configurations by jointly minimizing prediction error and energy consumption on an AMD Spartan-7 XC7S15 FPGA. Evaluated on real-world sewer data, the selected 8-bit Transformer model, trained on 24 hours of historical measurements, achieves high accuracy (MSE 0.0376) at an energy cost of 0.370 mJ per inference. In contrast, the optimal 8-bit LSTM model requires significantly less energy (0.009 mJ, over 40x lower) but yields 14.89% worse accuracy (MSE 0.0432) and much longer training time. This trade-off highlights the need to align model selection with deployment priorities, favoring LSTM for ultra-low energy consumption or Transformer for higher predictive accuracy. In general, our work enables local, energy-efficient forecasting, contributing to more resilient combined sewer systems. All code can be found in the GitHub Repository (https://github.com/tianheng-ling/EdgeOverflowForecast).
Related papers
- LiQSS: Post-Transformer Linear Quantum-Inspired State-Space Tensor Networks for Real-Time 6G [85.58816960936069]
Proactive and agentic control in Sixth-Generation (6G) Open Radio Access Networks (O-RAN) requires control-grade prediction under stringent Near-Time (Near-RT) latency and computational constraints.<n>This paper investigates a post-Transformer paradigm for efficient radio telemetry forecasting.<n>We propose a quantum-inspired state-space tensor network that replaces self-attention with stable structured state-space dynamics kernels.
arXiv Detail & Related papers (2026-01-18T12:08:38Z) - Smart IoT-Based Leak Forecasting and Detection for Energy-Efficient Liquid Cooling in AI Data Centers [0.0]
We present a proof-of-concept smart IoT monitoring system combining LSTM neural networks for probabilistic leak forecasting.<n>For a typical 47-rack facility, this approach could prevent roughly 1,500 annual energy waste.
arXiv Detail & Related papers (2025-12-25T22:51:16Z) - EdgeFlex-Transformer: Transformer Inference for Edge Devices [2.1130318406254074]
We propose a lightweight yet effective multi-stage optimization pipeline designed to compress and accelerate Vision Transformers (ViTs)<n>Our methodology combines activation profiling, memory-aware pruning, selective mixed-precision execution, and activation-aware quantization (AWQ) to reduce the model's memory footprint without requiring costly retraining or task-specific fine-tuning.<n>Experiments on CIFAR-10 demonstrate that the fully optimized model achieves a 76% reduction in peak memory usage and over 6x lower latency, while retaining or even improving accuracy compared to the original FP32 baseline.
arXiv Detail & Related papers (2025-12-17T21:45:12Z) - A Lightweight DL Model for Smart Grid Power Forecasting with Feature and Resolution Mismatch [0.4999814847776097]
This paper challenges teams to predict-day power demand using real-world high-frequency data.<n>We propose a robust yet lightweight Deep Learning pipeline combining hourly downsizing, dual-mode imputation, and comprehensive normalization.<n>A sequence-to-one model achieves an average RMSE of 601.9W, MAE of 468.9W, and 84.36% accuracy.
arXiv Detail & Related papers (2025-10-19T16:12:53Z) - Fremer: Lightweight and Effective Frequency Transformer for Workload Forecasting in Cloud Services [9.687789919349523]
We propose Fremer, an efficient and effective deep forecasting model.<n>Fremer fulfills three critical requirements: it demonstrates superior efficiency, outperforming most Transformer-based forecasting models.<n>It achieves exceptional accuracy, surpassing all state-of-the-art (SOTA) models in workload forecasting.
arXiv Detail & Related papers (2025-07-17T08:51:28Z) - FlowTS: Time Series Generation via Rectified Flow [67.41208519939626]
FlowTS is an ODE-based model that leverages rectified flow with straight-line transport in probability space.<n>For unconditional setting, FlowTS achieves state-of-the-art performance, with context FID scores of 0.019 and 0.011 on Stock and ETTh datasets.<n>For conditional setting, we have achieved superior performance in solar forecasting.
arXiv Detail & Related papers (2024-11-12T03:03:23Z) - Enhancing Predictive Maintenance in Mining Mobile Machinery through a TinyML-enabled Hierarchical Inference Network [0.0]
This paper introduces the Edge Sensor Network for Predictive Maintenance (ESN-PdM)
ESN-PdM is a hierarchical inference framework across edge devices, gateways, and cloud services for real-time condition monitoring.
System dynamically adjusts inference locations--on-device, on-gateway, or on-cloud--based on trade-offs among accuracy, latency, and battery life.
arXiv Detail & Related papers (2024-11-11T17:48:04Z) - Quantized Neural Networks for Low-Precision Accumulation with Guaranteed
Overflow Avoidance [68.8204255655161]
We introduce a quantization-aware training algorithm that guarantees avoiding numerical overflow when reducing the precision of accumulators during inference.
We evaluate our algorithm across multiple quantized models that we train for different tasks, showing that our approach can reduce the precision of accumulators while maintaining model accuracy with respect to a floating-point baseline.
arXiv Detail & Related papers (2023-01-31T02:46:57Z) - GNN4REL: Graph Neural Networks for Predicting Circuit Reliability
Degradation [7.650966670809372]
We employ graph neural networks (GNNs) to accurately estimate the impact of process variations and device aging on the delay of any path within a circuit.
GNN4REL is trained on a FinFET technology model that is calibrated against industrial 14nm measurement data.
We successfully estimate delay degradations of all paths -- notably within seconds -- with a mean absolute error down to 0.01 percentage points.
arXiv Detail & Related papers (2022-08-04T20:09:12Z) - Cascaded Deep Hybrid Models for Multistep Household Energy Consumption
Forecasting [5.478764356647437]
This study introduces two hybrid cascaded models for forecasting multistep household power consumption in different resolutions.
The proposed hybrid models achieve superior prediction performance compared to the existing multistep power consumption prediction methods.
arXiv Detail & Related papers (2022-07-06T11:02:23Z) - Energy-Efficient Model Compression and Splitting for Collaborative
Inference Over Time-Varying Channels [52.60092598312894]
We propose a technique to reduce the total energy bill at the edge device by utilizing model compression and time-varying model split between the edge and remote nodes.
Our proposed solution results in minimal energy consumption and $CO$ emission compared to the considered baselines.
arXiv Detail & Related papers (2021-06-02T07:36:27Z) - Non-Parametric Adaptive Network Pruning [125.4414216272874]
We introduce non-parametric modeling to simplify the algorithm design.
Inspired by the face recognition community, we use a message passing algorithm to obtain an adaptive number of exemplars.
EPruner breaks the dependency on the training data in determining the "important" filters.
arXiv Detail & Related papers (2021-01-20T06:18:38Z) - Highly Efficient Salient Object Detection with 100K Parameters [137.74898755102387]
We propose a flexible convolutional module, namely generalized OctConv (gOctConv), to efficiently utilize both in-stage and cross-stages multi-scale features.
We build an extremely light-weighted model, namely CSNet, which achieves comparable performance with about 0.2% (100k) of large models on popular object detection benchmarks.
arXiv Detail & Related papers (2020-03-12T07:00:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.