Related papers: Impact of Load Demand Dataset Characteristics on Clustering Validation Indices

Impact of Load Demand Dataset Characteristics on Clustering Validation Indices

URL: http://arxiv.org/abs/2108.01433v1
Date: Tue, 3 Aug 2021 12:22:34 GMT
Title: Impact of Load Demand Dataset Characteristics on Clustering Validation Indices
Authors: Mayank Jain, Mukta Jain, Tarek AlSkaif, and Soumyabrata Dev
Abstract summary: Clustering households based on their demand profiles is one of the primary, yet a key component of such analysis. Various cluster validation indices (CVIs) have been proposed in the literature. This paper shows how the recommendations of validation indices are influenced by different data characteristics.
Score: 1.5749416770494706
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the inclusion of smart meters, electricity load consumption data can be fetched for individual consumer buildings at high temporal resolutions. Availability of such data has made it possible to study daily load demand profiles of the households. Clustering households based on their demand profiles is one of the primary, yet a key component of such analysis. While many clustering algorithms/frameworks can be deployed to perform clustering, they usually generate very different clusters. In order to identify the best clustering results, various cluster validation indices (CVIs) have been proposed in the literature. However, it has been noticed that different CVIs often recommend different algorithms. This leads to the problem of identifying the most suitable CVI for a given dataset. Responding to the problem, this paper shows how the recommendations of validation indices are influenced by different data characteristics that might be present in a typical residential load demand dataset. Furthermore, the paper identifies the features of data that prefer/prohibit the use of a particular cluster validation index.

Related papers

Towards Fair Representation: Clustering and Consensus [1.7243216387069678]
We find a consensus clustering that is not only representative but also fair with respect to specific protected attributes.<n>As part of our investigation, we examine how to minimally modify an existing clustering to enforce fairness.<n>We develop an optimal algorithm for datasets with equal group representation and near-linear time constant factor approximation algorithms.
arXiv Detail & Related papers (2025-06-10T10:33:21Z)
Adaptive and Robust DBSCAN with Multi-agent Reinforcement Learning [53.527506374566485]
We propose a novel Adaptive and Robust DBSCAN with Multi-agent Reinforcement Learning cluster framework, namely AR-DBSCAN.<n>We show that AR-DBSCAN not only improves clustering accuracy by up to 144.1% and 175.3% in the NMI and ARI metrics, respectively, but also is capable of robustly finding dominant parameters.
arXiv Detail & Related papers (2025-05-07T11:37:23Z)
Towards Learnable Anchor for Deep Multi-View Clustering [49.767879678193005]
In this paper, we propose the Deep Multi-view Anchor Clustering (DMAC) model that performs clustering in linear time. With the optimal anchors, the full sample graph is calculated to derive a discriminative embedding for clustering. Experiments on several datasets demonstrate superior performance and efficiency of DMAC compared to state-of-the-art competitors.
arXiv Detail & Related papers (2025-03-16T09:38:11Z)
DREW : Towards Robust Data Provenance by Leveraging Error-Controlled Watermarking [58.37644304554906]
We propose Data Retrieval with Error-corrected codes and Watermarking (DREW) DREW randomly clusters the reference dataset and injects unique error-controlled watermark keys into each cluster. After locating the relevant cluster, embedding vector similarity retrieval is performed within the cluster to find the most accurate matches.
arXiv Detail & Related papers (2024-06-05T01:19:44Z)
Interpretable Clustering with the Distinguishability Criterion [0.4419843514606336]
We present a global criterion called the Distinguishability criterion to quantify the separability of identified clusters and validate inferred cluster configurations. We propose a combined loss function-based computational framework that integrates the Distinguishability criterion with many commonly used clustering procedures. We present these new algorithms as well as the results from comprehensive data analysis based on simulation studies and real data applications.
arXiv Detail & Related papers (2024-04-24T16:38:15Z)
On the Use of Relative Validity Indices for Comparing Clustering Approaches [0.6990493129893111]
Relative Validity Indices are widely used for evaluating and optimising clustering outcomes. There is a growing trend in the literature to use RVIs when selecting a Similarity Paradigm (SP) for clustering. This study presents the first comprehensive investigation into the reliability of RVIs for SP selection.
arXiv Detail & Related papers (2024-04-16T07:39:54Z)
Self Supervised Correlation-based Permutations for Multi-View Clustering [7.093692674858257]
We propose an end-to-end deep learning-based multi-view clustering framework for general data types.<n>Our approach involves generating meaningful fused representations using a novel permutation-based canonical correlation objective.
arXiv Detail & Related papers (2024-02-26T08:08:30Z)
A Machine Learning-Based Framework for Clustering Residential Electricity Load Profiles to Enhance Demand Response Programs [0.0]
We present a novel machine learning based framework in order to achieve optimal load profiling through a real case study. In this paper, we present a novel machine learning based framework in order to achieve optimal load profiling through a real case study.
arXiv Detail & Related papers (2023-10-31T11:23:26Z)
Deep Clustering: A Comprehensive Survey [53.387957674512585]
Clustering analysis plays an indispensable role in machine learning and data mining. Deep clustering, which can learn clustering-friendly representations using deep neural networks, has been broadly applied in a wide range of clustering tasks. Existing surveys for deep clustering mainly focus on the single-view fields and the network architectures, ignoring the complex application scenarios of clustering.
arXiv Detail & Related papers (2022-10-09T02:31:32Z)
Selecting the suitable resampling strategy for imbalanced data classification regarding dataset properties [62.997667081978825]
In many application domains such as medicine, information retrieval, cybersecurity, social media, etc., datasets used for inducing classification models often have an unequal distribution of the instances of each class. This situation, known as imbalanced data classification, causes low predictive performance for the minority class examples. Oversampling and undersampling techniques are well-known strategies to deal with this problem by balancing the number of examples of each class.
arXiv Detail & Related papers (2021-12-15T18:56:39Z)
Clustering performance analysis using new correlation based cluster validity indices [0.0]
We develop two new cluster validity indices based on a correlation between an actual distance between a pair of data points and a centroid distance of clusters that the two points locate in. Our proposed indices constantly yield several peaks at different numbers of clusters which overcome the weakness previously stated.
arXiv Detail & Related papers (2021-09-23T06:59:41Z)
A review of systematic selection of clustering algorithms and their evaluation [0.0]
This paper aims to identify a systematic selection logic for clustering algorithms and corresponding validation concepts. The goal is to enable potential users to choose an algorithm that fits best to their needs and the properties of their underlying data clustering problem.
arXiv Detail & Related papers (2021-06-24T07:01:46Z)
Topology-based Clusterwise Regression for User Segmentation and Demand Forecasting [63.78344280962136]
Using a public and a novel proprietary data set of commercial data, this research shows that the proposed system enables analysts to both cluster their user base and plan demand at a granular level. This work seeks to introduce TDA-based clustering of time series and clusterwise regression with matrix factorization methods as viable tools for the practitioner.
arXiv Detail & Related papers (2020-09-08T12:10:10Z)
Decorrelated Clustering with Data Selection Bias [55.91842043124102]
We propose a novel Decorrelation regularized K-Means algorithm (DCKM) for clustering with data selection bias. Our DCKM algorithm achieves significant performance gains, indicating the necessity of removing unexpected feature correlations induced by selection bias.
arXiv Detail & Related papers (2020-06-29T08:55:50Z)
New advances in enumerative biclustering algorithms with online partitioning [80.22629846165306]
This paper further extends RIn-Close_CVC, a biclustering algorithm capable of performing an efficient, complete, correct and non-redundant enumeration of maximal biclusters with constant values on columns in numerical datasets. The improved algorithm is called RIn-Close_CVC3, keeps those attractive properties of RIn-Close_CVC, and is characterized by: a drastic reduction in memory usage; a consistent gain in runtime.
arXiv Detail & Related papers (2020-03-07T14:54:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.