Utilizing Domain Knowledge: Robust Machine Learning for Building Energy
Prediction with Small, Inconsistent Datasets
- URL: http://arxiv.org/abs/2302.10784v1
- Date: Mon, 23 Jan 2023 08:56:11 GMT
- Title: Utilizing Domain Knowledge: Robust Machine Learning for Building Energy
Prediction with Small, Inconsistent Datasets
- Authors: Xia Chen, Manav Mahan Sing, Philipp Geyer
- Abstract summary: The demand for a huge amount of data for machine learning (ML) applications is currently a bottleneck.
We propose a method to combine prior knowledge with data-driven methods to significantly reduce their data dependency.
CBML as the knowledge-encoded data-driven method is examined in the context of energy-efficient building engineering.
- Score: 1.1081836812143175
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The demand for a huge amount of data for machine learning (ML) applications
is currently a bottleneck in an empirically dominated field. We propose a
method to combine prior knowledge with data-driven methods to significantly
reduce their data dependency. In this study, component-based machine learning
(CBML) as the knowledge-encoded data-driven method is examined in the context
of energy-efficient building engineering. It encodes the abstraction of
building structural knowledge as semantic information in the model
organization. We design a case experiment to understand the efficacy of
knowledge-encoded ML in sparse data input (1% - 0.0125% sampling rate). The
result reveals its three advanced features compared with pure ML methods: 1.
Significant improvement in the robustness of ML to extremely small-size and
inconsistent datasets; 2. Efficient data utilization from different entities'
record collections; 3. Characteristics of accepting incomplete data with high
interpretability and reduced training time. All these features provide a
promising path to alleviating the deployment bottleneck of data-intensive
methods and contribute to efficient real-world data usage. Moreover, four
necessary prerequisites are summarized in this study that ensures the target
scenario benefits by combining prior knowledge and ML generalization.
Related papers
- KBAlign: Efficient Self Adaptation on Specific Knowledge Bases [75.78948575957081]
Large language models (LLMs) usually rely on retrieval-augmented generation to exploit knowledge materials in an instant manner.
We propose KBAlign, an approach designed for efficient adaptation to downstream tasks involving knowledge bases.
Our method utilizes iterative training with self-annotated data such as Q&A pairs and revision suggestions, enabling the model to grasp the knowledge content efficiently.
arXiv Detail & Related papers (2024-11-22T08:21:03Z) - Deriva-ML: A Continuous FAIRness Approach to Reproducible Machine Learning Models [1.204452887718077]
We show how data management tools can significantly improve the quality of data that is used for machine learning (ML) applications.
We propose an architecture and implementation of such tools and demonstrate through two use cases how they can be used to improve ML-based eScience investigations.
arXiv Detail & Related papers (2024-06-27T04:42:29Z) - Informed Meta-Learning [55.2480439325792]
Meta-learning and informed ML stand out as two approaches for incorporating prior knowledge into ML pipelines.
We formalise a hybrid paradigm, informed meta-learning, facilitating the incorporation of priors from unstructured knowledge representations.
We demonstrate the potential benefits of informed meta-learning in improving data efficiency, robustness to observational noise and task distribution shifts.
arXiv Detail & Related papers (2024-02-25T15:08:37Z) - Don't Push the Button! Exploring Data Leakage Risks in Machine Learning and Transfer Learning [0.0]
This paper addresses a critical issue in Machine Learning (ML) where unintended information contaminates the training data, impacting model performance evaluation.
The discrepancy between evaluated and actual performance on new data is a significant concern.
It explores the connection between data leakage and the specific task being addressed, investigates its occurrence in Transfer Learning, and compares standard inductive ML with transductive ML frameworks.
arXiv Detail & Related papers (2024-01-24T20:30:52Z) - Improving Few-Shot Generalization by Exploring and Exploiting Auxiliary
Data [100.33096338195723]
We focus on Few-shot Learning with Auxiliary Data (FLAD)
FLAD assumes access to auxiliary data during few-shot learning in hopes of improving generalization.
We propose two algorithms -- EXP3-FLAD and UCB1-FLAD -- and compare them with prior FLAD methods that either explore or exploit.
arXiv Detail & Related papers (2023-02-01T18:59:36Z) - Open-Set Semi-Supervised Learning for 3D Point Cloud Understanding [62.17020485045456]
It is commonly assumed in semi-supervised learning (SSL) that the unlabeled data are drawn from the same distribution as that of the labeled ones.
We propose to selectively utilize unlabeled data through sample weighting, so that only conducive unlabeled data would be prioritized.
arXiv Detail & Related papers (2022-05-02T16:09:17Z) - Understanding the World Through Action [91.3755431537592]
I will argue that a general, principled, and powerful framework for utilizing unlabeled data can be derived from reinforcement learning.
I will discuss how such a procedure is more closely aligned with potential downstream tasks.
arXiv Detail & Related papers (2021-10-24T22:33:52Z) - A Survey on Large-scale Machine Learning [67.6997613600942]
Machine learning can provide deep insights into data, allowing machines to make high-quality predictions.
Most sophisticated machine learning approaches suffer from huge time costs when operating on large-scale data.
Large-scale Machine Learning aims to learn patterns from big data with comparable performance efficiently.
arXiv Detail & Related papers (2020-08-10T06:07:52Z) - Injective Domain Knowledge in Neural Networks for Transprecision
Computing [17.300144121921882]
This paper studies the improvements that can be obtained by integrating prior knowledge when dealing with a non-trivial learning task.
The results clearly show that ML models exploiting problem-specific information outperform the purely data-driven ones, with an average accuracy improvement around 38%.
arXiv Detail & Related papers (2020-02-24T12:58:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.