SmartMeterFM: Unifying Smart Meter Data Generative Tasks Using Flow Matching Models
- URL: http://arxiv.org/abs/2601.21706v2
- Date: Fri, 30 Jan 2026 11:22:42 GMT
- Title: SmartMeterFM: Unifying Smart Meter Data Generative Tasks Using Flow Matching Models
- Authors: Nan Lin, Yanbo Wang, Jacco Heres, Peter Palensky, Pedro P. Vergara,
- Abstract summary: We propose a new approach to unify diverse smart meter data generative tasks with a single model trained for conditional generation.<n>By viewing different generative tasks as distinct forms of partial data observations, we unify tasks such as imputation and super-resolution with a single model.
- Score: 5.581165920380358
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Smart meter data is the foundation for planning and operating the distribution network. Unfortunately, such data are not always available due to privacy regulations. Meanwhile, the collected data may be corrupted due to sensor or transmission failure, or it may not have sufficient resolution for downstream tasks. A wide range of generative tasks is formulated to address these issues, including synthetic data generation, missing data imputation, and super-resolution. Despite the success of machine learning models on these tasks, dedicated models need to be designed and trained for each task, leading to redundancy and inefficiency. In this paper, by recognizing the powerful modeling capability of flow matching models, we propose a new approach to unify diverse smart meter data generative tasks with a single model trained for conditional generation. The proposed flow matching models are trained to generate challenging, high-dimensional time series data, specifically monthly smart meter data at a 15 min resolution. By viewing different generative tasks as distinct forms of partial data observations and injecting them into the generation process, we unify tasks such as imputation and super-resolution with a single model, eliminating the need for re-training. The data generated by our model not only are consistent with the given observations but also remain realistic, showing better performance against interpolation and other machine learning based baselines dedicated to the tasks.
Related papers
- SDQM: Synthetic Data Quality Metric for Object Detection Dataset Evaluation [3.2150327776278576]
This paper introduces the Synthetic dataset Quality Metric (SDQM) to assess data quality for object detection tasks.<n>In our experiments, SDQM demonstrated a strong correlation with the mean Average Precision (mAP) scores of YOLOv11, a leading object detection model.<n>It provides actionable insights for improving dataset quality, minimizing the need for costly iterative training.
arXiv Detail & Related papers (2025-10-08T03:01:26Z) - Data Fusion of Deep Learned Molecular Embeddings for Property Prediction [41.99844472131922]
Data-driven approaches such as deep learning can result in predictive models for material properties with exceptional accuracy and efficiency.<n>To improve predictions, techniques such as transfer learning and multitask learning have been used.<n>Standard multitask models tend to underperform when trained on sparse data sets with weakly correlated properties.<n>We demonstrate this technique on a widely used benchmark data set of quantum chemistry data for small molecules and a newly compiled sparse data set of experimental data collected from literature and our own quantum chemistry and thermochemical calculations.
arXiv Detail & Related papers (2025-04-09T21:40:15Z) - Forewarned is Forearmed: Leveraging LLMs for Data Synthesis through Failure-Inducing Exploration [90.41908331897639]
Large language models (LLMs) have significantly benefited from training on diverse, high-quality task-specific data.
We present a novel approach, ReverseGen, designed to automatically generate effective training samples.
arXiv Detail & Related papers (2024-10-22T06:43:28Z) - Certain and Approximately Certain Models for Statistical Learning [4.318959672085627]
We show that it is possible to learn accurate models directly from data with missing values for certain training data and target models.
We build efficient algorithms with theoretical guarantees to check this necessity and return accurate models in cases where imputation is unnecessary.
arXiv Detail & Related papers (2024-02-27T22:49:33Z) - Task-Distributionally Robust Data-Free Meta-Learning [99.56612787882334]
Data-Free Meta-Learning (DFML) aims to efficiently learn new tasks by leveraging multiple pre-trained models without requiring their original training data.
For the first time, we reveal two major challenges hindering their practical deployments: Task-Distribution Shift ( TDS) and Task-Distribution Corruption (TDC)
arXiv Detail & Related papers (2023-11-23T15:46:54Z) - Learning Defect Prediction from Unrealistic Data [57.53586547895278]
Pretrained models of code have become popular choices for code understanding and generation tasks.
Such models tend to be large and require commensurate volumes of training data.
It has become popular to train models with far larger but less realistic datasets, such as functions with artificially injected bugs.
Models trained on such data tend to only perform well on similar data, while underperforming on real world programs.
arXiv Detail & Related papers (2023-11-02T01:51:43Z) - AdaMerging: Adaptive Model Merging for Multi-Task Learning [68.75885518081357]
This paper introduces an innovative technique called Adaptive Model Merging (AdaMerging)
It aims to autonomously learn the coefficients for model merging, either in a task-wise or layer-wise manner, without relying on the original training data.
Compared to the current state-of-the-art task arithmetic merging scheme, AdaMerging showcases a remarkable 11% improvement in performance.
arXiv Detail & Related papers (2023-10-04T04:26:33Z) - Diffusion Model is an Effective Planner and Data Synthesizer for
Multi-Task Reinforcement Learning [101.66860222415512]
Multi-Task Diffusion Model (textscMTDiff) is a diffusion-based method that incorporates Transformer backbones and prompt learning for generative planning and data synthesis.
For generative planning, we find textscMTDiff outperforms state-of-the-art algorithms across 50 tasks on Meta-World and 8 maps on Maze2D.
arXiv Detail & Related papers (2023-05-29T05:20:38Z) - Synthetic Model Combination: An Instance-wise Approach to Unsupervised
Ensemble Learning [92.89846887298852]
Consider making a prediction over new test data without any opportunity to learn from a training set of labelled data.
Give access to a set of expert models and their predictions alongside some limited information about the dataset used to train them.
arXiv Detail & Related papers (2022-10-11T10:20:31Z) - Scanflow: A multi-graph framework for Machine Learning workflow
management, supervision, and debugging [0.0]
We propose a novel containerized directed graph framework to support end-to-end Machine Learning workflow management.
The framework allows defining and deploying ML in containers, tracking their metadata, checking their behavior in production, and improving the models by using both learned and human-provided knowledge.
arXiv Detail & Related papers (2021-11-04T17:01:12Z) - Deep convolutional generative adversarial networks for traffic data
imputation encoding time series as images [7.053891669775769]
We have developed a generative adversarial network (GAN) based traffic sensor data imputation framework (TGAN)
In this study, we have developed a novel time-dependent encoding method called the Gramian Angular Summation Field (GASF)
This study shows that the proposed model can significantly improve the traffic data imputation accuracy in terms of Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) compared to state-of-the-art models on the benchmark dataset.
arXiv Detail & Related papers (2020-05-05T19:14:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.