HydroGEM: A Self Supervised Zero Shot Hybrid TCN Transformer Foundation Model for Continental Scale Streamflow Quality Control
- URL: http://arxiv.org/abs/2512.14106v1
- Date: Tue, 16 Dec 2025 05:39:26 GMT
- Title: HydroGEM: A Self Supervised Zero Shot Hybrid TCN Transformer Foundation Model for Continental Scale Streamflow Quality Control
- Authors: Ijaz Ul Haq, Byung Suk Lee, Julia N. Perdrial, David Baude,
- Abstract summary: We introduce HydroGEM, a foundation model for continental-scale streamflow quality control.<n>HydroGEM uses two-stage training: self-supervised pretraining on 6.03 million sequences from 3,724 stations.<n>On held-out synthetic tests, HydroGEM F1 = 0.792 for detection and 6 reconstruction-error reduction, a 36.3% improvement over existing methods.
- Score: 2.153503585184628
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Real-time streamflow monitoring networks generate millions of observations annually, yet maintaining data quality across thousands of remote sensors remains labor-intensive. We introduce HydroGEM (Hydrological Generalizable Encoder for Monitoring), a foundation model for continental-scale streamflow quality control. HydroGEM uses two-stage training: self-supervised pretraining on 6.03 million sequences from 3,724 USGS stations learns hydrological representations, followed by fine-tuning with synthetic anomalies for detection and reconstruction. A hybrid TCN-Transformer architecture (14.2M parameters) captures local temporal patterns and long-range dependencies, while hierarchical normalization handles six orders of magnitude in discharge. On held-out synthetic tests comprising 799 stations with 18 expert-validated anomaly types, HydroGEM achieves F1 = 0.792 for detection and 68.7% reconstruction-error reduction, a 36.3% improvement over existing methods. Zero-shot transfer to 100 Environment and Climate Change Canada stations yields F1 = 0.586, exceeding all baselines and demonstrating cross-national generalization. The model maintains consistent detection across correction magnitudes and aligns with operational seasonal patterns. HydroGEM is designed for human-in-the-loop workflows - outputs are quality control suggestions requiring expert review, not autonomous corrections.
Related papers
- TopoFlow: Physics-guided Neural Networks for high-resolution air quality prediction [8.841726805480361]
TopoFlow is a physics-guided neural network for efficient, high-resolution air quality prediction.<n>We identify two critical factors governing pollutant dynamics: topography and wind direction.<n>TopoFlow achieves a PM2.5 RMSE of 9.71 ug/m3, representing a 71-80% improvement over operational forecasting systems.
arXiv Detail & Related papers (2026-02-18T19:38:11Z) - Benchmarking IoT Time-Series AD with Event-Level Augmentations [34.864214444544565]
We introduce an evaluation protocol with unified event-level augmentations that simulate real-world issues.<n>We evaluate 14 representative models on five public anomaly datasets.
arXiv Detail & Related papers (2026-02-17T09:45:44Z) - Structure-Informed Estimation for Pilot-Limited MIMO Channels via Tensor Decomposition [51.56484100374058]
This paper formulates pilot-limited channel estimation as low-rank tensor completion from sparse observations.<n>Experiments on synthetic channels demonstrate 10-20,dB normalized mean-square error (NMSE) improvement over least-squares (LS)<n> evaluations on DeepMIMO ray-tracing channels show 24-44% additional NMSE reduction over pure tensor-based methods.
arXiv Detail & Related papers (2026-02-03T23:38:05Z) - Automated river gauge plate reading using a hybrid object detection and generative AI framework in the Limpopo River Basin [0.0]
This study presents a hybrid framework integrating vision based waterline detection, YOLOv8 pose scale extraction, and large multimodal language models for automated river gauge plate reading.<n>Experiments demonstrate that waterline detection achieved high precision of 94.24 percent and an F1 score of 83.64 percent, while scale gap detection provided accurate geometric calibration for subsequent reading extraction.<n>Results highlight the sensitivity of LLMs to image quality, with degraded images producing higher errors, and underscore the importance of combining geometric metadata with multimodal artificial intelligence for robust water level estimation.
arXiv Detail & Related papers (2025-12-29T13:26:59Z) - HydroFusion-LMF: Semi-Supervised Multi-Network Fusion with Large-Model Adaptation for Long-Term Daily Runoff Forecasting [3.3915788299794767]
HydroFusion-LMF performs trend-residual decomposition to reduce non-stationarity.<n>It fuses expert outputs via a hydrologic context-aware gate conditioned on day-of-year phase.<n>It attains MSE 1.0128 / MAE 0.5818 on a 10-year daily dataset.
arXiv Detail & Related papers (2025-10-04T09:09:06Z) - Automated Energy-Aware Time-Series Model Deployment on Embedded FPGAs for Resilient Combined Sewer Overflow Management [17.903318666906728]
Extreme weather events, intensified by climate change, increasingly challenge aging combined sewer systems.<n>Forecasting of sewer overflow basin filling levels can provide actionable insights for early intervention.<n>We propose an end-to-end forecasting framework that enables energy-efficient inference directly on edge devices.
arXiv Detail & Related papers (2025-08-19T15:06:04Z) - Machine Learning for Proactive Groundwater Management: Early Warning and Resource Allocation [1.372066170415575]
We develop a machine learning pipeline that predicts groundwater level categories using climate data, hydro-meteorological records, and physiographic attributes.<n>Our approach integrates geospatial preprocessing, domain-driven feature engineering, and automated model selection to overcome monitoring limitations.
arXiv Detail & Related papers (2025-06-18T00:41:04Z) - ReconMOST: Multi-Layer Sea Temperature Reconstruction with Observations-Guided Diffusion [48.540756751934836]
ReconMOST is a data-driven guided diffusion model framework for multi-layer sea temperature reconstruction.<n>Our method extends ML-based SST reconstruction to a global, multi-layer setting, handling over 92.5% missing data.
arXiv Detail & Related papers (2025-06-12T06:27:22Z) - DiTEC-WDN: A Large-Scale Dataset of Hydraulic Scenarios across Multiple Water Distribution Networks [41.94295877935867]
This dataset comprises 36,000 unique scenarios simulated over either short-term (24 hours) or long-term (1 year) periods.<n>DiTEC-WDN can support a variety of machine-learning tasks, including graph-level, node-level, and link-level regression, as well as time-series forecasting.<n>This contribution, released under a public license, encourages open scientific research in the critical water sector, eliminates the risk of exposing sensitive data, and fulfills the need for a large-scale water distribution network benchmark for study comparisons and scenario analysis.
arXiv Detail & Related papers (2025-03-21T14:14:03Z) - AI for Water Sustainability: Global Water Quality Assessment and Prediction with Explainable AI with LLM Chatbot for Insights [0.0]
This paper introduces various hybrid deep learning models to predict on the CCME dataset with multiple water quality parameters from Canada, China, the UK, the USA, and Ireland.<n>CatBoost, XGBoost, and Extra Trees Regressor predicted Water Quality Index (WQI) values with an average RMSE of 1.2 and an R squared score of 0.99.
arXiv Detail & Related papers (2024-09-17T05:26:59Z) - Lazy Layers to Make Fine-Tuned Diffusion Models More Traceable [70.77600345240867]
A novel arbitrary-in-arbitrary-out (AIAO) strategy makes watermarks resilient to fine-tuning-based removal.
Unlike the existing methods of designing a backdoor for the input/output space of diffusion models, in our method, we propose to embed the backdoor into the feature space of sampled subpaths.
Our empirical studies on the MS-COCO, AFHQ, LSUN, CUB-200, and DreamBooth datasets confirm the robustness of AIAO.
arXiv Detail & Related papers (2024-05-01T12:03:39Z) - Residual Corrective Diffusion Modeling for Km-scale Atmospheric Downscaling [58.456404022536425]
State of the art for physical hazard prediction from weather and climate requires expensive km-scale numerical simulations driven by coarser resolution global inputs.
Here, a generative diffusion architecture is explored for downscaling such global inputs to km-scale, as a cost-effective machine learning alternative.
The model is trained to predict 2km data from a regional weather model over Taiwan, conditioned on a 25km global reanalysis.
arXiv Detail & Related papers (2023-09-24T19:57:22Z) - Echofilter: A Deep Learning Segmentation Model Improves the Automation,
Standardization, and Timeliness for Post-Processing Echosounder Data in Tidal
Energy Streams [3.7067444579637074]
Tidal currents that make sites favorable for tidal energy development are often highly turbulent and entrain air into the water.
The portion of the water column contaminated by returns from entrained air must be excluded from data used for biological analyses.
We develop deep learning models that produce a pronounced, consistent, substantial, and measurable improvement of the automated detection of the extent to which entrained-air has penetrated the water column.
arXiv Detail & Related papers (2022-02-19T17:26:46Z) - Towards a Competitive End-to-End Speech Recognition for CHiME-6 Dinner
Party Transcription [73.66530509749305]
In this paper, we argue that, even in difficult cases, some end-to-end approaches show performance close to the hybrid baseline.
We experimentally compare and analyze CTC-Attention versus RNN-Transducer approaches along with RNN versus Transformer architectures.
Our best end-to-end model based on RNN-Transducer, together with improved beam search, reaches quality by only 3.8% WER abs. worse than the LF-MMI TDNN-F CHiME-6 Challenge baseline.
arXiv Detail & Related papers (2020-04-22T19:08:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.