Related papers: Towards Comparability in Non-Intrusive Load Monitoring: On Data and Performance Evaluation

Towards Comparability in Non-Intrusive Load Monitoring: On Data and Performance Evaluation

URL: http://arxiv.org/abs/2001.07708v1
Date: Mon, 20 Jan 2020 10:13:51 GMT
Title: Towards Comparability in Non-Intrusive Load Monitoring: On Data and Performance Evaluation
Authors: Christoph Klemenjak, Stephen Makonin and Wilfried Elmenreich
Abstract summary: Non-Intrusive Load Monitoring (NILM) comprises of a set of techniques that provide insights into the energy consumption of households and industrial facilities. Despite progress made concerning disaggregation techniques, performance evaluation and comparability remains an open research question. Detailed information on pre-processing as well as data cleaning methods, the importance of unified performance reporting, and the need for complexity measures in load disaggregation are found to be the most urgent issues in NILM-related research.
Score: 1.0312968200748116
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Non-Intrusive Load Monitoring (NILM) comprises of a set of techniques that provide insights into the energy consumption of households and industrial facilities. Latest contributions show significant improvements in terms of accuracy and generalisation abilities. Despite all progress made concerning disaggregation techniques, performance evaluation and comparability remains an open research question. The lack of standardisation and consensus on evaluation procedures makes reproducibility and comparability extremely difficult. In this paper, we draw attention to comparability in NILM with a focus on highlighting the considerable differences amongst common energy datasets used to test the performance of algorithms. We divide discussion on comparability into data aspects, performance metrics, and give a close view on evaluation processes. Detailed information on pre-processing as well as data cleaning methods, the importance of unified performance reporting, and the need for complexity measures in load disaggregation are found to be the most urgent issues in NILM-related research. In addition, our evaluation suggests that datasets should be chosen carefully. We conclude by formulating suggestions for future work to enhance comparability.

Related papers

Efficient Conformance Checking of Rich Data-Aware Declare Specifications (Extended) [49.46686813437884]
We show that it is possible to compute data-aware optimal alignments in a rich setting with general data types and data conditions.<n>This is achieved by carefully combining the two best-known approaches to deal with control flow and data dependencies.
arXiv Detail & Related papers (2025-06-30T10:16:21Z)
How Contaminated Is Your Benchmark? Quantifying Dataset Leakage in Large Language Models with Kernel Divergence [23.019102917957152]
datasets overlap with pre-training corpora, inflates performance metrics and undermines the reliability of model evaluations. We propose Kernel Divergence Score (KDS), a novel method that quantifies dataset contamination by computing the divergence between the kernel similarity matrix of sample embeddings. KDS demonstrates a near-perfect correlation with contamination levels and outperforms existing baselines.
arXiv Detail & Related papers (2025-02-02T05:50:39Z)
Beyond the Numbers: Transparency in Relation Extraction Benchmark Creation and Leaderboards [5.632231145349045]
This paper investigates the transparency in the creation of benchmarks and the use of leaderboards for measuring progress in NLP. Existing relation extraction benchmarks often suffer from insufficient documentation and lack crucial details. While our discussion centers on the transparency of RE benchmarks and leaderboards, the observations we discuss are broadly applicable to other NLP tasks as well.
arXiv Detail & Related papers (2024-11-07T22:36:19Z)
OCDB: Revisiting Causal Discovery with a Comprehensive Benchmark and Evaluation Framework [21.87740178652843]
Causal discovery offers a promising approach to improve transparency and reliability. We propose a flexible evaluation framework with metrics for evaluating differences in causal structures and causal effects. We introduce the Open Causal Discovery Benchmark (OCDB), based on real data, to promote fair comparisons and drive optimization of algorithms.
arXiv Detail & Related papers (2024-06-07T03:09:22Z)
VALOR-EVAL: Holistic Coverage and Faithfulness Evaluation of Large Vision-Language Models [57.43276586087863]
Large Vision-Language Models (LVLMs) suffer from hallucination issues, wherein the models generate plausible-sounding but factually incorrect outputs. Existing benchmarks are often limited in scope, focusing mainly on object hallucinations. We introduce a multi-dimensional benchmark covering objects, attributes, and relations, with challenging images selected based on associative biases.
arXiv Detail & Related papers (2024-04-22T04:49:22Z)
Rectified Max-Value Entropy Search for Bayesian Optimization [54.26984662139516]
We develop a rectified MES acquisition function based on the notion of mutual information. As a result, RMES shows a consistent improvement over MES in several synthetic function benchmarks and real-world optimization problems.
arXiv Detail & Related papers (2022-02-28T08:11:02Z)
QAFactEval: Improved QA-Based Factual Consistency Evaluation for Summarization [116.56171113972944]
We show that carefully choosing the components of a QA-based metric is critical to performance. Our solution improves upon the best-performing entailment-based metric and achieves state-of-the-art performance.
arXiv Detail & Related papers (2021-12-16T00:38:35Z)
A Two-Stage Feature Selection Approach for Robust Evaluation of Treatment Effects in High-Dimensional Observational Data [1.4710887888397084]
We propose a novel two-stage feature selection technique called, Outcome Adaptive Elastic Net (OAENet) OAENet is explicitly designed for making robust causal inference decisions using matching techniques. Numerical experiments on simulated data demonstrate that OAENet significantly outperforms state-of-the-art methods.
arXiv Detail & Related papers (2021-11-27T02:54:30Z)
SAIS: Supervising and Augmenting Intermediate Steps for Document-Level Relation Extraction [51.27558374091491]
We propose to explicitly teach the model to capture relevant contexts and entity types by supervising and augmenting intermediate steps (SAIS) for relation extraction. Based on a broad spectrum of carefully designed tasks, our proposed SAIS method not only extracts relations of better quality due to more effective supervision, but also retrieves the corresponding supporting evidence more accurately.
arXiv Detail & Related papers (2021-09-24T17:37:35Z)
Doing Great at Estimating CATE? On the Neglected Assumptions in Benchmark Comparisons of Treatment Effect Estimators [91.3755431537592]
We show that even in arguably the simplest setting, estimation under ignorability assumptions can be misleading. We consider two popular machine learning benchmark datasets for evaluation of heterogeneous treatment effect estimators. We highlight that the inherent characteristics of the benchmark datasets favor some algorithms over others.
arXiv Detail & Related papers (2021-07-28T13:21:27Z)
Evaluation of Unsupervised Entity and Event Salience Estimation [17.74208462902158]
Salience Estimation aims to predict term importance in documents. Previous studies typically generate pseudo-ground truth for evaluation. In this work, we propose a light yet practical entity and event salience estimation evaluation protocol.
arXiv Detail & Related papers (2021-04-14T15:23:08Z)
Causal Feature Selection for Algorithmic Fairness [61.767399505764736]
We consider fairness in the integration component of data management. We propose an approach to identify a sub-collection of features that ensure the fairness of the dataset.
arXiv Detail & Related papers (2020-06-10T20:20:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.