SERVIMON: AI-Driven Predictive Maintenance and Real-Time Monitoring for Astronomical Observatories
- URL: http://arxiv.org/abs/2510.27146v1
- Date: Fri, 31 Oct 2025 03:47:13 GMT
- Title: SERVIMON: AI-Driven Predictive Maintenance and Real-Time Monitoring for Astronomical Observatories
- Authors: Emilio Mastriani, Alessandro Costa, Federico Incardona, Kevin Munari, Sebastiano Spinello,
- Abstract summary: ServiMon is designed to offer a scalable and intelligent pipeline for data collection and auditing.<n>System enhances quality control, predictive maintenance, and real-time anomaly detection for telescope operations.
- Score: 36.94429692322632
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Objective: ServiMon is designed to offer a scalable and intelligent pipeline for data collection and auditing to monitor distributed astronomical systems such as the ASTRI Mini-Array. The system enhances quality control, predictive maintenance, and real-time anomaly detection for telescope operations. Methods: ServiMon integrates cloud-native technologies-including Prometheus, Grafana, Cassandra, Kafka, and InfluxDB-for telemetry collection and processing. It employs machine learning algorithms, notably Isolation Forest, to detect anomalies in Cassandra performance metrics. Key indicators such as read/write latency, throughput, and memory usage are continuously monitored, stored as time-series data, and preprocessed for feature engineering. Anomalies detected by the model are logged in InfluxDB v2 and accessed via Flux for real-time monitoring and visualization. Results: AI-based anomaly detection increases system resilience by identifying performance degradation at an early stage, minimizing downtime, and optimizing telescope operations. Additionally, ServiMon supports astrostatistical analysis by correlating telemetry with observational data, thus enhancing scientific data quality. AI-generated alerts also improve real-time monitoring, enabling proactive system management. Conclusion: ServiMon's scalable framework proves effective for predictive maintenance and real-time monitoring of astronomical infrastructures. By leveraging cloud and edge computing, it is adaptable to future large-scale experiments, optimizing both performance and cost. The combination of machine learning and big data analytics makes ServiMon a robust and flexible solution for modern and next-generation observational astronomy.
Related papers
- Scalable Cloud-Native Architectures for Intelligent PMU Data Processing [0.13543803103181612]
Phasor Measurement Units (PMUs) generate high-frequency, time-synchronized data essential for real-time power grid monitoring.<n>This paper presents a cloud-native architecture for intelligent PMU data processing that integrates artificial intelligence with edge and cloud computing.
arXiv Detail & Related papers (2025-12-23T06:45:38Z) - GAL-MAD: Towards Explainable Anomaly Detection in Microservice Applications Using Graph Attention Networks [1.0136215038345013]
Anomalies stemming from network and performance issues must be swiftly identified and addressed.<n>Existing anomaly detection techniques often rely on statistical models or machine learning methods.<n>We propose a novel anomaly detection model called Graph Attention and LSTM-based Microservice Anomaly Detection (GAL-MAD)
arXiv Detail & Related papers (2025-03-31T10:11:31Z) - LLM Assisted Anomaly Detection Service for Site Reliability Engineers: Enhancing Cloud Infrastructure Resilience [5.644170923282226]
This paper introduces a scalable Anomaly Detection Service with a generalizable API tailored for industrial time-series data.<n>We provide insights into the usage patterns of the service, with over 500 users and 200,000 API calls in a year.<n>We plan to extend the system to include time series foundation models, enabling zero-shot anomaly detection capabilities.
arXiv Detail & Related papers (2025-01-28T06:41:37Z) - Anomaly Detection in Large-Scale Cloud Systems: An Industry Case and Dataset [1.293050392312921]
We introduce a new high-dimensional dataset from IBM Cloud, collected over 4.5 months from the IBM Cloud Console.<n>This dataset comprises 39,365 rows and 117,448 columns of telemetry data.<n>We demonstrate the application of machine learning models for anomaly detection and discuss the key challenges faced in this process.
arXiv Detail & Related papers (2024-11-13T22:04:19Z) - Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - Closing the loop: Autonomous experiments enabled by
machine-learning-based online data analysis in synchrotron beamline
environments [80.49514665620008]
Machine learning can be used to enhance research involving large or rapidly generated datasets.
In this study, we describe the incorporation of ML into a closed-loop workflow for X-ray reflectometry (XRR)
We present solutions that provide an elementary data analysis in real time during the experiment without introducing the additional software dependencies in the beamline control software environment.
arXiv Detail & Related papers (2023-06-20T21:21:19Z) - LogLAB: Attention-Based Labeling of Log Data Anomalies via Weak
Supervision [63.08516384181491]
We present LogLAB, a novel modeling approach for automated labeling of log messages without requiring manual work by experts.
Our method relies on estimated failure time windows provided by monitoring systems to produce precise labeled datasets in retrospect.
Our evaluation shows that LogLAB consistently outperforms nine benchmark approaches across three different datasets and maintains an F1-score of more than 0.98 even at large failure time windows.
arXiv Detail & Related papers (2021-11-02T15:16:08Z) - TELESTO: A Graph Neural Network Model for Anomaly Classification in
Cloud Services [77.454688257702]
Machine learning (ML) and artificial intelligence (AI) are applied on IT system operation and maintenance.
One direction aims at the recognition of re-occurring anomaly types to enable remediation automation.
We propose a method that is invariant to dimensionality changes of given data.
arXiv Detail & Related papers (2021-02-25T14:24:49Z) - Federated Learning in the Sky: Aerial-Ground Air Quality Sensing
Framework with UAV Swarms [53.38353133198842]
Air quality significantly affects human health, it is increasingly important to accurately and timely predict the Air Quality Index (AQI)
This paper proposes a new federated learning-based aerial-ground air quality sensing framework for fine-grained 3D air quality monitoring and forecasting.
For ground sensing systems, we propose a Graph Convolutional neural network-based Long Short-Term Memory (GC-LSTM) model to achieve accurate, real-time and future AQI inference.
arXiv Detail & Related papers (2020-07-23T13:32:47Z) - Real-Time Anomaly Detection in Data Centers for Log-based Predictive
Maintenance using an Evolving Fuzzy-Rule-Based Approach [0.0]
We focus on the Tier-1 data center of the Italian Institute for Nuclear Physics (INFN), which supports the high-energy physics experiments at the Large Hadron Collider (LHC) in Geneva.
We propose a real-time approach to monitor and classify log records based on sliding time windows, and a time-varying evolving fuzzy-rule-based classification model.
arXiv Detail & Related papers (2020-04-25T21:19:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.