Towards Comparability in Non-Intrusive Load Monitoring: On Data and
Performance Evaluation
- URL: http://arxiv.org/abs/2001.07708v1
- Date: Mon, 20 Jan 2020 10:13:51 GMT
- Title: Towards Comparability in Non-Intrusive Load Monitoring: On Data and
Performance Evaluation
- Authors: Christoph Klemenjak, Stephen Makonin and Wilfried Elmenreich
- Abstract summary: Non-Intrusive Load Monitoring (NILM) comprises of a set of techniques that provide insights into the energy consumption of households and industrial facilities.
Despite progress made concerning disaggregation techniques, performance evaluation and comparability remains an open research question.
Detailed information on pre-processing as well as data cleaning methods, the importance of unified performance reporting, and the need for complexity measures in load disaggregation are found to be the most urgent issues in NILM-related research.
- Score: 1.0312968200748116
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Non-Intrusive Load Monitoring (NILM) comprises of a set of techniques that
provide insights into the energy consumption of households and industrial
facilities. Latest contributions show significant improvements in terms of
accuracy and generalisation abilities. Despite all progress made concerning
disaggregation techniques, performance evaluation and comparability remains an
open research question. The lack of standardisation and consensus on evaluation
procedures makes reproducibility and comparability extremely difficult. In this
paper, we draw attention to comparability in NILM with a focus on highlighting
the considerable differences amongst common energy datasets used to test the
performance of algorithms. We divide discussion on comparability into data
aspects, performance metrics, and give a close view on evaluation processes.
Detailed information on pre-processing as well as data cleaning methods, the
importance of unified performance reporting, and the need for complexity
measures in load disaggregation are found to be the most urgent issues in
NILM-related research. In addition, our evaluation suggests that datasets
should be chosen carefully. We conclude by formulating suggestions for future
work to enhance comparability.
Related papers
- Beyond the Numbers: Transparency in Relation Extraction Benchmark Creation and Leaderboards [5.632231145349045]
This paper investigates the transparency in the creation of benchmarks and the use of leaderboards for measuring progress in NLP.
Existing relation extraction benchmarks often suffer from insufficient documentation and lack crucial details.
While our discussion centers on the transparency of RE benchmarks and leaderboards, the observations we discuss are broadly applicable to other NLP tasks as well.
arXiv Detail & Related papers (2024-11-07T22:36:19Z) - OCDB: Revisiting Causal Discovery with a Comprehensive Benchmark and Evaluation Framework [21.87740178652843]
Causal discovery offers a promising approach to improve transparency and reliability.
We propose a flexible evaluation framework with metrics for evaluating differences in causal structures and causal effects.
We introduce the Open Causal Discovery Benchmark (OCDB), based on real data, to promote fair comparisons and drive optimization of algorithms.
arXiv Detail & Related papers (2024-06-07T03:09:22Z) - VALOR-EVAL: Holistic Coverage and Faithfulness Evaluation of Large Vision-Language Models [57.43276586087863]
Large Vision-Language Models (LVLMs) suffer from hallucination issues, wherein the models generate plausible-sounding but factually incorrect outputs.
Existing benchmarks are often limited in scope, focusing mainly on object hallucinations.
We introduce a multi-dimensional benchmark covering objects, attributes, and relations, with challenging images selected based on associative biases.
arXiv Detail & Related papers (2024-04-22T04:49:22Z) - Rectified Max-Value Entropy Search for Bayesian Optimization [54.26984662139516]
We develop a rectified MES acquisition function based on the notion of mutual information.
As a result, RMES shows a consistent improvement over MES in several synthetic function benchmarks and real-world optimization problems.
arXiv Detail & Related papers (2022-02-28T08:11:02Z) - QAFactEval: Improved QA-Based Factual Consistency Evaluation for
Summarization [116.56171113972944]
We show that carefully choosing the components of a QA-based metric is critical to performance.
Our solution improves upon the best-performing entailment-based metric and achieves state-of-the-art performance.
arXiv Detail & Related papers (2021-12-16T00:38:35Z) - A Two-Stage Feature Selection Approach for Robust Evaluation of
Treatment Effects in High-Dimensional Observational Data [1.4710887888397084]
We propose a novel two-stage feature selection technique called, Outcome Adaptive Elastic Net (OAENet)
OAENet is explicitly designed for making robust causal inference decisions using matching techniques.
Numerical experiments on simulated data demonstrate that OAENet significantly outperforms state-of-the-art methods.
arXiv Detail & Related papers (2021-11-27T02:54:30Z) - SAIS: Supervising and Augmenting Intermediate Steps for Document-Level
Relation Extraction [51.27558374091491]
We propose to explicitly teach the model to capture relevant contexts and entity types by supervising and augmenting intermediate steps (SAIS) for relation extraction.
Based on a broad spectrum of carefully designed tasks, our proposed SAIS method not only extracts relations of better quality due to more effective supervision, but also retrieves the corresponding supporting evidence more accurately.
arXiv Detail & Related papers (2021-09-24T17:37:35Z) - Doing Great at Estimating CATE? On the Neglected Assumptions in
Benchmark Comparisons of Treatment Effect Estimators [91.3755431537592]
We show that even in arguably the simplest setting, estimation under ignorability assumptions can be misleading.
We consider two popular machine learning benchmark datasets for evaluation of heterogeneous treatment effect estimators.
We highlight that the inherent characteristics of the benchmark datasets favor some algorithms over others.
arXiv Detail & Related papers (2021-07-28T13:21:27Z) - Evaluation of Unsupervised Entity and Event Salience Estimation [17.74208462902158]
Salience Estimation aims to predict term importance in documents.
Previous studies typically generate pseudo-ground truth for evaluation.
In this work, we propose a light yet practical entity and event salience estimation evaluation protocol.
arXiv Detail & Related papers (2021-04-14T15:23:08Z) - Causal Feature Selection for Algorithmic Fairness [61.767399505764736]
We consider fairness in the integration component of data management.
We propose an approach to identify a sub-collection of features that ensure the fairness of the dataset.
arXiv Detail & Related papers (2020-06-10T20:20:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.