Related papers: Scaling Data-Driven Building Energy Modelling using Large Language Models

Scaling Data-Driven Building Energy Modelling using Large Language Models

URL: http://arxiv.org/abs/2407.03469v1
Date: Wed, 3 Jul 2024 19:34:24 GMT
Title: Scaling Data-Driven Building Energy Modelling using Large Language Models
Authors: Sunil Khadka, Liang Zhang,
Abstract summary: We propose a methodology to tackle the scalability challenges associated with the development of data-driven models for Building Management System. We use Large Language Models (LLMs) to generate code that processes structured data from BMS and build data-driven models for BMS's specific requirements. Our case study indicates that bi-sequential prompting under the prompt template can achieve a high success rate of code generation and code accuracy, and significantly reduce human labor costs.
Score: 3.0309252269809264
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Building Management System (BMS) through a data-driven method always faces data and model scalability issues. We propose a methodology to tackle the scalability challenges associated with the development of data-driven models for BMS by using Large Language Models (LLMs). LLMs' code generation adaptability can enable broader adoption of BMS by "automating the automation," particularly the data handling and data-driven modeling processes. In this paper, we use LLMs to generate code that processes structured data from BMS and build data-driven models for BMS's specific requirements. This eliminates the need for manual data and model development, reducing the time, effort, and cost associated with this process. Our hypothesis is that LLMs can incorporate domain knowledge about data science and BMS into data processing and modeling, ensuring that the data-driven modeling is automated for specific requirements of different building types and control objectives, which also improves accuracy and scalability. We generate a prompt template following the framework of Machine Learning Operations so that the prompts are designed to systematically generate Python code for data-driven modeling. Our case study indicates that bi-sequential prompting under the prompt template can achieve a high success rate of code generation and code accuracy, and significantly reduce human labor costs.

Related papers

Automatic MILP Model Construction for Multi-Robot Task Allocation and Scheduling Based on Large Language Models [13.960259962694126]
Existing methods face challenges in adapting to dynamic production constraints. enterprises have high privacy requirements for production scheduling data. This study proposes a knowledge-augmented mixed integer lineartemporal (MILP) automated framework.
arXiv Detail & Related papers (2025-03-18T01:45:19Z)
The Performance of the LSTM-based Code Generated by Large Language Models (LLMs) in Forecasting Time Series Data [0.3749861135832072]
This paper investigates and compares the performance of the mainstream LLMs, such as ChatGPT, PaLM, LLama, and Falcon, in generating deep learning models for analyzing time series data. The results can be beneficial for data analysts and practitioners who would like to leverage generative AIs to produce good prediction models with acceptable goodness.
arXiv Detail & Related papers (2024-11-27T20:18:36Z)
Can Models Help Us Create Better Models? Evaluating LLMs as Data Scientists [41.94295877935867]
We present a benchmark for large language models designed to tackle one of the most knowledge-intensive tasks in data science. We demonstrate that the FeatEng of our proposal can cheaply and efficiently assess the broad capabilities of LLMs.
arXiv Detail & Related papers (2024-10-30T17:59:01Z)
Forewarned is Forearmed: Leveraging LLMs for Data Synthesis through Failure-Inducing Exploration [90.41908331897639]
Large language models (LLMs) have significantly benefited from training on diverse, high-quality task-specific data. We present a novel approach, ReverseGen, designed to automatically generate effective training samples.
arXiv Detail & Related papers (2024-10-22T06:43:28Z)
DataEnvGym: Data Generation Agents in Teacher Environments with Student Feedback [62.235925602004535]
DataEnvGym is a testbed of teacher environments for data generation agents. It frames data generation as a sequential decision-making task, involving an agent and a data generation engine. Students are iteratively trained and evaluated on generated data, and their feedback is reported to the agent after each iteration.
arXiv Detail & Related papers (2024-10-08T17:20:37Z)
Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models [79.41139393080736]
Large language models (LLMs) have rapidly advanced and demonstrated impressive capabilities. In-Context Learning (ICL) and. Efficient Fine-Tuning (PEFT) are currently two mainstream methods for augmenting. LLMs to downstream tasks. We propose Reference Trustable Decoding (RTD), a paradigm that allows models to quickly adapt to new tasks without fine-tuning.
arXiv Detail & Related papers (2024-09-30T10:48:20Z)
Towards Synthetic Trace Generation of Modeling Operations using In-Context Learning Approach [1.8874331450711404]
We propose a conceptual framework that combines modeling event logs, intelligent modeling assistants, and the generation of modeling operations. In particular, the architecture comprises modeling components that help the designer specify the system, record its operation within a graphical modeling environment, and automatically recommend relevant operations.
arXiv Detail & Related papers (2024-08-26T13:26:44Z)
ORLM: A Customizable Framework in Training Large Models for Automated Optimization Modeling [15.673219028826173]
We introduce a semi-automated data synthesis framework designed for optimization modeling issues, named OR-Instruct. We train various open-source LLMs with a capacity of 7 billion parameters (dubbed ORLMs) The resulting model demonstrates significantly enhanced optimization modeling capabilities, achieving state-of-the-art performance across the NL4OPT, MAMO, and IndustryOR benchmarks.
arXiv Detail & Related papers (2024-05-28T01:55:35Z)
UniDM: A Unified Framework for Data Manipulation with Large Language Models [66.61466011795798]
Large Language Models (LLMs) resolve multiple data manipulation tasks. LLMs exhibit bright benefits in terms of performance but still require customized designs to fit each specific task. We propose UniDM, a unified framework which establishes a new paradigm to process data manipulation tasks.
arXiv Detail & Related papers (2024-05-10T14:44:04Z)
Adapting Large Language Models for Content Moderation: Pitfalls in Data Engineering and Supervised Fine-tuning [79.53130089003986]
Large Language Models (LLMs) have become a feasible solution for handling tasks in various domains. In this paper, we introduce how to fine-tune a LLM model that can be privately deployed for content moderation.
arXiv Detail & Related papers (2023-10-05T09:09:44Z)
Prompt2Model: Generating Deployable Models from Natural Language Instructions [74.19816829003729]
Large language models (LLMs) enable system builders to create competent NLP systems through prompting. In other ways, LLMs are a step backward from traditional special-purpose NLP models. We propose Prompt2Model, a general-purpose method that takes a natural language task description like the prompts provided to LLMs.
arXiv Detail & Related papers (2023-08-23T17:28:21Z)
TSGM: A Flexible Framework for Generative Modeling of Synthetic Time Series [61.436361263605114]
Time series data are often scarce or highly sensitive, which precludes the sharing of data between researchers and industrial organizations. We introduce Time Series Generative Modeling (TSGM), an open-source framework for the generative modeling of synthetic time series.
arXiv Detail & Related papers (2023-05-19T10:11:21Z)
Optimizing the AI Development Process by Providing the Best Support Environment [0.756282840161499]
Main stages of machine learning are problem understanding, data management, model building, model deployment and maintenance. The framework was built using python language to perform data augmentation using deep learning advancements.
arXiv Detail & Related papers (2023-04-29T00:44:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.