On Designing Data Models for Energy Feature Stores
- URL: http://arxiv.org/abs/2205.04267v1
- Date: Mon, 9 May 2022 13:35:53 GMT
- Title: On Designing Data Models for Energy Feature Stores
- Authors: Gregor Cerar, Bla\v{z} Bertalani\v{c}, An\v{z}e Pirnat, Andrej
\v{C}ampa, Carolina Fortuna
- Abstract summary: We study data models, energy feature engineering and feature management solutions for developing ML-based energy applications.
We first propose a taxonomy for designing data models suitable for energy applications, analyze feature engineering techniques able to transform the data model into features suitable for ML model training and finally also analyze available designs for feature stores.
- Score: 0.5809784853115825
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The digitization of the energy infrastructure enables new, data driven,
applications often supported by machine learning models. However, domain
specific data transformations, pre-processing and management in modern data
driven pipelines is yet to be addressed. In this paper we perform a first time
study on data models, energy feature engineering and feature management
solutions for developing ML-based energy applications. We first propose a
taxonomy for designing data models suitable for energy applications, analyze
feature engineering techniques able to transform the data model into features
suitable for ML model training and finally also analyze available designs for
feature stores. Using a short-term forecasting dataset, we show the benefits of
designing richer data models and engineering the features on the performance of
the resulting models. Finally, we benchmark three complementary feature
management solutions, including an open-source feature store.
Related papers
- A Survey of Model Architectures in Information Retrieval [64.75808744228067]
We focus on two key aspects: backbone models for feature extraction and end-to-end system architectures for relevance estimation.
We trace the development from traditional term-based methods to modern neural approaches, particularly highlighting the impact of transformer-based models and subsequent large language models (LLMs)
We conclude by discussing emerging challenges and future directions, including architectural optimizations for performance and scalability, handling of multimodal, multilingual data, and adaptation to novel application domains beyond traditional search paradigms.
arXiv Detail & Related papers (2025-02-20T18:42:58Z) - SMPLest-X: Ultimate Scaling for Expressive Human Pose and Shape Estimation [81.36747103102459]
Expressive human pose and shape estimation (EHPS) unifies body, hands, and face motion capture with numerous applications.
Current state-of-the-art methods focus on training innovative architectural designs on confined datasets.
We investigate the impact of scaling up EHPS towards a family of generalist foundation models.
arXiv Detail & Related papers (2025-01-16T18:59:46Z) - On Foundation Models for Dynamical Systems from Purely Synthetic Data [5.004576576202551]
Foundation models have demonstrated remarkable generalization, data efficiency, and robustness properties across various domains.
These models are available in fields like natural language processing and computer vision, but do not exist for dynamical systems.
We address this challenge by pretraining a transformer-based foundation model exclusively on synthetic data.
Our results demonstrate the feasibility of foundation models for dynamical systems that outperform specialist models in terms of generalization, data efficiency, and robustness.
arXiv Detail & Related papers (2024-11-30T08:34:10Z) - Forewarned is Forearmed: Leveraging LLMs for Data Synthesis through Failure-Inducing Exploration [90.41908331897639]
Large language models (LLMs) have significantly benefited from training on diverse, high-quality task-specific data.
We present a novel approach, ReverseGen, designed to automatically generate effective training samples.
arXiv Detail & Related papers (2024-10-22T06:43:28Z) - Fine-Tuning and Deploying Large Language Models Over Edges: Issues and Approaches [64.42735183056062]
Large language models (LLMs) have transitioned from specialized models to versatile foundation models.
LLMs exhibit impressive zero-shot ability, however, require fine-tuning on local datasets and significant resources for deployment.
arXiv Detail & Related papers (2024-08-20T09:42:17Z) - Towards Efficient Task-Driven Model Reprogramming with Foundation Models [52.411508216448716]
Vision foundation models exhibit impressive power, benefiting from the extremely large model capacity and broad training data.
However, in practice, downstream scenarios may only support a small model due to the limited computational resources or efficiency considerations.
This brings a critical challenge for the real-world application of foundation models: one has to transfer the knowledge of a foundation model to the downstream task.
arXiv Detail & Related papers (2023-04-05T07:28:33Z) - T-METASET: Task-Aware Generation of Metamaterial Datasets by
Diversity-Based Active Learning [14.668178146934588]
We propose t-METASET: an intelligent data acquisition framework for task-aware dataset generation.
We validate the proposed framework in three hypothetical deployment scenarios, which encompass general use, task-aware use, and tailorable use.
arXiv Detail & Related papers (2022-02-21T22:46:49Z) - Concept for a Technical Infrastructure for Management of Predictive
Models in Industrial Applications [0.0]
We describe our technological concept for a model management system.
This concept includes versioned storage of data, support for different machine learning algorithms, fine tuning of models, subsequent deployment of models and monitoring of model performance after deployment.
arXiv Detail & Related papers (2021-07-29T08:38:46Z) - Learning Discrete Energy-based Models via Auxiliary-variable Local
Exploration [130.89746032163106]
We propose ALOE, a new algorithm for learning conditional and unconditional EBMs for discrete structured data.
We show that the energy function and sampler can be trained efficiently via a new variational form of power iteration.
We present an energy model guided fuzzer for software testing that achieves comparable performance to well engineered fuzzing engines like libfuzzer.
arXiv Detail & Related papers (2020-11-10T19:31:29Z) - Gradient-Based Training and Pruning of Radial Basis Function Networks
with an Application in Materials Physics [0.24792948967354234]
We propose a gradient-based technique for training radial basis function networks with an efficient and scalable open-source implementation.
We derive novel closed-form optimization criteria for pruning the models for continuous as well as binary data.
arXiv Detail & Related papers (2020-04-06T11:32:37Z) - From Data to Actions in Intelligent Transportation Systems: a
Prescription of Functional Requirements for Model Actionability [10.27718355111707]
This work aims to describe how data, coming from diverse ITS sources, can be used to learn and adapt data-driven models for efficiently operating ITS assets, systems and processes.
Grounded in this described data modeling pipeline for ITS, wedefine the characteristics, engineering requisites and intrinsic challenges to its three compounding stages, namely, data fusion, adaptive learning and model evaluation.
arXiv Detail & Related papers (2020-02-06T12:02:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.