Impact of Load Demand Dataset Characteristics on Clustering Validation
Indices
- URL: http://arxiv.org/abs/2108.01433v1
- Date: Tue, 3 Aug 2021 12:22:34 GMT
- Title: Impact of Load Demand Dataset Characteristics on Clustering Validation
Indices
- Authors: Mayank Jain, Mukta Jain, Tarek AlSkaif, and Soumyabrata Dev
- Abstract summary: Clustering households based on their demand profiles is one of the primary, yet a key component of such analysis.
Various cluster validation indices (CVIs) have been proposed in the literature.
This paper shows how the recommendations of validation indices are influenced by different data characteristics.
- Score: 1.5749416770494706
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the inclusion of smart meters, electricity load consumption data can be
fetched for individual consumer buildings at high temporal resolutions.
Availability of such data has made it possible to study daily load demand
profiles of the households. Clustering households based on their demand
profiles is one of the primary, yet a key component of such analysis. While
many clustering algorithms/frameworks can be deployed to perform clustering,
they usually generate very different clusters. In order to identify the best
clustering results, various cluster validation indices (CVIs) have been
proposed in the literature. However, it has been noticed that different CVIs
often recommend different algorithms. This leads to the problem of identifying
the most suitable CVI for a given dataset. Responding to the problem, this
paper shows how the recommendations of validation indices are influenced by
different data characteristics that might be present in a typical residential
load demand dataset. Furthermore, the paper identifies the features of data
that prefer/prohibit the use of a particular cluster validation index.
Related papers
- DREW : Towards Robust Data Provenance by Leveraging Error-Controlled Watermarking [58.37644304554906]
We propose Data Retrieval with Error-corrected codes and Watermarking (DREW)
DREW randomly clusters the reference dataset and injects unique error-controlled watermark keys into each cluster.
After locating the relevant cluster, embedding vector similarity retrieval is performed within the cluster to find the most accurate matches.
arXiv Detail & Related papers (2024-06-05T01:19:44Z) - Interpretable Clustering with the Distinguishability Criterion [0.4419843514606336]
We present a global criterion called the Distinguishability criterion to quantify the separability of identified clusters and validate inferred cluster configurations.
We propose a combined loss function-based computational framework that integrates the Distinguishability criterion with many commonly used clustering procedures.
We present these new algorithms as well as the results from comprehensive data analysis based on simulation studies and real data applications.
arXiv Detail & Related papers (2024-04-24T16:38:15Z) - On the Use of Relative Validity Indices for Comparing Clustering Approaches [0.6990493129893111]
Relative Validity Indices are widely used for evaluating and optimising clustering outcomes.
There is a growing trend in the literature to use RVIs when selecting a Similarity Paradigm (SP) for clustering.
This study presents the first comprehensive investigation into the reliability of RVIs for SP selection.
arXiv Detail & Related papers (2024-04-16T07:39:54Z) - A Machine Learning-Based Framework for Clustering Residential
Electricity Load Profiles to Enhance Demand Response Programs [0.0]
We present a novel machine learning based framework in order to achieve optimal load profiling through a real case study.
In this paper, we present a novel machine learning based framework in order to achieve optimal load profiling through a real case study.
arXiv Detail & Related papers (2023-10-31T11:23:26Z) - Deep Clustering: A Comprehensive Survey [53.387957674512585]
Clustering analysis plays an indispensable role in machine learning and data mining.
Deep clustering, which can learn clustering-friendly representations using deep neural networks, has been broadly applied in a wide range of clustering tasks.
Existing surveys for deep clustering mainly focus on the single-view fields and the network architectures, ignoring the complex application scenarios of clustering.
arXiv Detail & Related papers (2022-10-09T02:31:32Z) - Selecting the suitable resampling strategy for imbalanced data
classification regarding dataset properties [62.997667081978825]
In many application domains such as medicine, information retrieval, cybersecurity, social media, etc., datasets used for inducing classification models often have an unequal distribution of the instances of each class.
This situation, known as imbalanced data classification, causes low predictive performance for the minority class examples.
Oversampling and undersampling techniques are well-known strategies to deal with this problem by balancing the number of examples of each class.
arXiv Detail & Related papers (2021-12-15T18:56:39Z) - Clustering performance analysis using new correlation based cluster
validity indices [0.0]
We develop two new cluster validity indices based on a correlation between an actual distance between a pair of data points and a centroid distance of clusters that the two points locate in.
Our proposed indices constantly yield several peaks at different numbers of clusters which overcome the weakness previously stated.
arXiv Detail & Related papers (2021-09-23T06:59:41Z) - A review of systematic selection of clustering algorithms and their
evaluation [0.0]
This paper aims to identify a systematic selection logic for clustering algorithms and corresponding validation concepts.
The goal is to enable potential users to choose an algorithm that fits best to their needs and the properties of their underlying data clustering problem.
arXiv Detail & Related papers (2021-06-24T07:01:46Z) - Topology-based Clusterwise Regression for User Segmentation and Demand
Forecasting [63.78344280962136]
Using a public and a novel proprietary data set of commercial data, this research shows that the proposed system enables analysts to both cluster their user base and plan demand at a granular level.
This work seeks to introduce TDA-based clustering of time series and clusterwise regression with matrix factorization methods as viable tools for the practitioner.
arXiv Detail & Related papers (2020-09-08T12:10:10Z) - Decorrelated Clustering with Data Selection Bias [55.91842043124102]
We propose a novel Decorrelation regularized K-Means algorithm (DCKM) for clustering with data selection bias.
Our DCKM algorithm achieves significant performance gains, indicating the necessity of removing unexpected feature correlations induced by selection bias.
arXiv Detail & Related papers (2020-06-29T08:55:50Z) - New advances in enumerative biclustering algorithms with online
partitioning [80.22629846165306]
This paper further extends RIn-Close_CVC, a biclustering algorithm capable of performing an efficient, complete, correct and non-redundant enumeration of maximal biclusters with constant values on columns in numerical datasets.
The improved algorithm is called RIn-Close_CVC3, keeps those attractive properties of RIn-Close_CVC, and is characterized by: a drastic reduction in memory usage; a consistent gain in runtime.
arXiv Detail & Related papers (2020-03-07T14:54:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.