Related papers: Reproducibility of Machine Learning-Based Fault Detection and Diagnosis for HVAC Systems in Buildings: An Empirical Study

Reproducibility of Machine Learning-Based Fault Detection and Diagnosis for HVAC Systems in Buildings: An Empirical Study

URL: http://arxiv.org/abs/2508.00880v1
Date: Wed, 23 Jul 2025 07:35:58 GMT
Title: Reproducibility of Machine Learning-Based Fault Detection and Diagnosis for HVAC Systems in Buildings: An Empirical Study
Authors: Adil Mukhtar, Michael Hadwiger, Franz Wotawa, Gerald Schweiger,
Abstract summary: This paper analyzes the transparency and standards of Machine Learning applications in building energy systems.<n>The results indicate that nearly all articles are not reproducible due to insufficient disclosure.<n>These findings highlight the need for targeted interventions, including guidelines, training for researchers, and policies by journals and conferences.
Score: 7.852209218432359
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Reproducibility is a cornerstone of scientific research, enabling independent verification and validation of empirical findings. The topic gained prominence in fields such as psychology and medicine, where concerns about non - replicable results sparked ongoing discussions about research practices. In recent years, the fast-growing field of Machine Learning (ML) has become part of this discourse, as it faces similar concerns about transparency and reliability. Some reproducibility issues in ML research are shared with other fields, such as limited access to data and missing methodological details. In addition, ML introduces specific challenges, including inherent nondeterminism and computational constraints. While reproducibility issues are increasingly recognized by the ML community and its major conferences, less is known about how these challenges manifest in applied disciplines. This paper contributes to closing this gap by analyzing the transparency and reproducibility standards of ML applications in building energy systems. The results indicate that nearly all articles are not reproducible due to insufficient disclosure across key dimensions of reproducibility. 72% of the articles do not specify whether the dataset used is public, proprietary, or commercially available. Only two papers share a link to their code - one of which was broken. Two-thirds of the publications were authored exclusively by academic researchers, yet no significant differences in reproducibility were observed compared to publications with industry-affiliated authors. These findings highlight the need for targeted interventions, including reproducibility guidelines, training for researchers, and policies by journals and conferences that promote transparency and reproducibility.

Related papers

How Do LLMs Encode Scientific Quality? An Empirical Study Using Monosemantic Features from Sparse Autoencoders [0.8633013637160062]
This paper investigates how large language models (LLMs) encode the concept of scientific quality.<n>We derive such features under different experimental settings and assess their ability to serve as predictors.<n>We identify four recurring types of features that capture key aspects of how research quality is represented.
arXiv Detail & Related papers (2026-02-22T10:12:20Z)
IV Co-Scientist: Multi-Agent LLM Framework for Causal Instrumental Variable Discovery [61.15184885636171]
In the presence of confounding between an endogenous variable and the outcome, instrumental variables (IVs) are used to isolate the causal effect of the endogenous variable.<n>We investigate whether large language models (LLMs) can aid in this task.<n>We introduce IV Co-Scientist, a multi-agent system that proposes, critiques, and refines IVs for a given treatment-outcome pair.
arXiv Detail & Related papers (2026-02-08T12:28:29Z)
Chasing Shadows: Pitfalls in LLM Security Research [14.334369124449346]
We identify nine common pitfalls that have become relevant with the emergence of large language models (LLMs)<n>These pitfalls span the entire process, from data collection, pre-training, and fine-tuning to prompting and evaluation.<n>We find that every paper contains at least one pitfall, and each pitfall appears in multiple papers. Yet only 15.7% of the present pitfalls were explicitly discussed, suggesting that the majority remain unrecognized.
arXiv Detail & Related papers (2025-12-10T11:39:09Z)
Hallucination Detection in LLMs with Topological Divergence on Attention Graphs [64.74977204942199]
Hallucination, i.e., generating factually incorrect content, remains a critical challenge for large language models.<n>We introduce TOHA, a TOpology-based HAllucination detector in the RAG setting.
arXiv Detail & Related papers (2025-04-14T10:06:27Z)
ResearchBench: Benchmarking LLMs in Scientific Discovery via Inspiration-Based Task Decomposition [67.26124739345332]
Large language models (LLMs) have demonstrated potential in assisting scientific research, yet their ability to discover high-quality research hypotheses remains unexamined.<n>We introduce the first large-scale benchmark for evaluating LLMs with a near-sufficient set of sub-tasks of scientific discovery.<n>We develop an automated framework that extracts critical components - research questions, background surveys, inspirations, and hypotheses - from scientific papers.
arXiv Detail & Related papers (2025-03-27T08:09:15Z)
Causal Representation Learning from Multimodal Biomedical Observations [57.00712157758845]
We develop flexible identification conditions for multimodal data and principled methods to facilitate the understanding of biomedical datasets.<n>Key theoretical contribution is the structural sparsity of causal connections between modalities.<n>Results on a real-world human phenotype dataset are consistent with established biomedical research.
arXiv Detail & Related papers (2024-11-10T16:40:27Z)
Reproducibility in Machine Learning-based Research: Overview, Barriers and Drivers [1.4841630983274845]
Lack of transparency, data or code, poor adherence to standards, and sensitivity of ML training mean that many papers are not even reproducible in principle.<n>Experiments have found worryingly low degrees of similarity with original results.<n>Poor integrity threatens trust in and integrity of research results.
arXiv Detail & Related papers (2024-06-20T13:56:42Z)
Lazy Data Practices Harm Fairness Research [49.02318458244464]
We present a comprehensive analysis of fair ML datasets, demonstrating how unreflective practices hinder the reach and reliability of algorithmic fairness findings. Our analyses identify three main areas of concern: (1) a textbflack of representation for certain protected attributes in both data and evaluations; (2) the widespread textbf of minorities during data preprocessing; and (3) textbfopaque data processing threatening the generalization of fairness research. This study underscores the need for a critical reevaluation of data practices in fair ML and offers directions to improve both the sourcing and usage of datasets.
arXiv Detail & Related papers (2024-04-26T09:51:24Z)
A Literature Review of Literature Reviews in Pattern Analysis and Machine Intelligence [55.33653554387953]
Pattern Analysis and Machine Intelligence (PAMI) has led to numerous literature reviews aimed at collecting and fragmented information.<n>This paper presents a thorough analysis of these literature reviews within the PAMI field.<n>We try to address three core research questions: (1) What are the prevalent structural and statistical characteristics of PAMI literature reviews; (2) What strategies can researchers employ to efficiently navigate the growing corpus of reviews; and (3) What are the advantages and limitations of AI-generated reviews compared to human-authored ones.
arXiv Detail & Related papers (2024-02-20T11:28:50Z)
Spurious Correlations in Machine Learning: A Survey [27.949532561102206]
Machine learning systems are sensitive to spurious correlations between non-essential features of the inputs and labels. These features and their correlations with the labels are known as "spurious" because they tend to change with shifts in real-world data distributions. We provide a review of this issue, along with a taxonomy of current state-of-the-art methods for addressing spurious correlations in machine learning models.
arXiv Detail & Related papers (2024-02-20T04:49:34Z)
Enhancing Uncertainty-Based Hallucination Detection with Stronger Focus [99.33091772494751]
Large Language Models (LLMs) have gained significant popularity for their impressive performance across diverse fields. LLMs are prone to hallucinate untruthful or nonsensical outputs that fail to meet user expectations. We propose a novel reference-free, uncertainty-based method for detecting hallucinations in LLMs.
arXiv Detail & Related papers (2023-11-22T08:39:17Z)
Reproducibility in Machine Learning-Driven Research [1.7936835766396748]
Research is facing a viability crisis, in which the results and findings of many studies are difficult or even impossible to reproduce. This is also the case in machine learning (ML) and artificial intelligence (AI) research. Although different solutions to address this issue are discussed in the research community such as using ML platforms, the level of in ML-driven research is not increasing substantially.
arXiv Detail & Related papers (2023-07-19T07:00:22Z)
Investigating Fairness Disparities in Peer Review: A Language Model Enhanced Approach [77.61131357420201]
We conduct a thorough and rigorous study on fairness disparities in peer review with the help of large language models (LMs) We collect, assemble, and maintain a comprehensive relational database for the International Conference on Learning Representations (ICLR) conference from 2017 to date. We postulate and study fairness disparities on multiple protective attributes of interest, including author gender, geography, author, and institutional prestige.
arXiv Detail & Related papers (2022-11-07T16:19:42Z)
Use and Misuse of Machine Learning in Anthropology [0.9786690381850356]
We will focus on the field of paleoanthropology, which seeks to understand the evolution of the human species based on biological and cultural evidence. The aim of this paper is to provide a brief introduction to some of the ways in which ML has been applied within paleoanthropology. We discuss a series of missteps, errors, and violations of correct protocols of ML methods that appear disconcertingly often within the accumulating body of anthropological literature.
arXiv Detail & Related papers (2022-09-06T20:32:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.