Clustering and Classification with Non-Existence Attributes: A Sentenced
Discrepancy Measure Based Technique
- URL: http://arxiv.org/abs/2002.10411v1
- Date: Mon, 24 Feb 2020 17:56:06 GMT
- Title: Clustering and Classification with Non-Existence Attributes: A Sentenced
Discrepancy Measure Based Technique
- Authors: Y. A. Joarder, Emran Hossain and Al Faisal Mahmud
- Abstract summary: Clustering approaches cannot be applied directly to such data unless pre-processing by techniques like imputation or marginalization.
We have overcome this drawback by utilizing a Sentenced Discrepancy Measure which we refer to as the Attribute Weighted Penalty based Discrepancy (AWPD)
This technique is designed to trace invaluable data to: directly apply our method on the datasets which have Non-Existence attributes and establish a method for detecting unstructured Non-Existence attributes with the best accuracy rate and minimum cost.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: For some or all of the data instances a number of independent-world
clustering issues suffer from incomplete data characterization due to losing or
absent attributes. Typical clustering approaches cannot be applied directly to
such data unless pre-processing by techniques like imputation or
marginalization. We have overcome this drawback by utilizing a Sentenced
Discrepancy Measure which we refer to as the Attribute Weighted Penalty based
Discrepancy (AWPD). Using the AWPD measure, we modified the K-MEANS++ and
Scalable K-MEANS++ for clustering algorithm and k Nearest Neighbor (kNN) for
classification so as to make them directly applicable to datasets with
non-existence attributes. We have presented a detailed theoretical analysis
which shows that the new AWPD based K-MEANS++, Scalable K-MEANS++ and kNN
algorithm merge into a local prime among the number of iterations is finite. We
have reported in depth experiments on numerous benchmark datasets for various
forms of Non-Existence showing that the projected clustering and classification
techniques usually show better results in comparison to some of the renowned
imputation methods that are generally used to process such insufficient data.
This technique is designed to trace invaluable data to: directly apply our
method on the datasets which have Non-Existence attributes and establish a
method for detecting unstructured Non-Existence attributes with the best
accuracy rate and minimum cost.
Related papers
- K-Means Clustering With Incomplete Data with the Use of Mahalanobis Distances [0.0]
We develop a unified K-means algorithm that incorporates Mahalanobis distances, instead of the traditional Euclidean distances.
We demonstrate that our algorithm consistently outperforms both standalone imputation followed by K-means.
These results hold across both the IRIS dataset and randomly generated data with elliptical clusters.
arXiv Detail & Related papers (2024-10-31T00:05:09Z) - Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks.
We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z) - Robust and Automatic Data Clustering: Dirichlet Process meets
Median-of-Means [18.3248037914529]
We present an efficient and automatic clustering technique by integrating the principles of model-based and centroid-based methodologies.
Statistical guarantees on the upper bound of clustering error suggest the advantages of our proposed method over existing state-of-the-art clustering algorithms.
arXiv Detail & Related papers (2023-11-26T19:01:15Z) - Adaptive Negative Evidential Deep Learning for Open-set Semi-supervised Learning [69.81438976273866]
Open-set semi-supervised learning (Open-set SSL) considers a more practical scenario, where unlabeled data and test data contain new categories (outliers) not observed in labeled data (inliers)
We introduce evidential deep learning (EDL) as an outlier detector to quantify different types of uncertainty, and design different uncertainty metrics for self-training and inference.
We propose a novel adaptive negative optimization strategy, making EDL more tailored to the unlabeled dataset containing both inliers and outliers.
arXiv Detail & Related papers (2023-03-21T09:07:15Z) - Meta Clustering Learning for Large-scale Unsupervised Person
Re-identification [124.54749810371986]
We propose a "small data for big task" paradigm dubbed Meta Clustering Learning (MCL)
MCL only pseudo-labels a subset of the entire unlabeled data via clustering to save computing for the first-phase training.
Our method significantly saves computational cost while achieving a comparable or even better performance compared to prior works.
arXiv Detail & Related papers (2021-11-19T04:10:18Z) - Robust Trimmed k-means [70.88503833248159]
We propose Robust Trimmed k-means (RTKM) that simultaneously identifies outliers and clusters points.
We show RTKM performs competitively with other methods on single membership data with outliers and multi-membership data without outliers.
arXiv Detail & Related papers (2021-08-16T15:49:40Z) - Doing Great at Estimating CATE? On the Neglected Assumptions in
Benchmark Comparisons of Treatment Effect Estimators [91.3755431537592]
We show that even in arguably the simplest setting, estimation under ignorability assumptions can be misleading.
We consider two popular machine learning benchmark datasets for evaluation of heterogeneous treatment effect estimators.
We highlight that the inherent characteristics of the benchmark datasets favor some algorithms over others.
arXiv Detail & Related papers (2021-07-28T13:21:27Z) - Self-Trained One-class Classification for Unsupervised Anomaly Detection [56.35424872736276]
Anomaly detection (AD) has various applications across domains, from manufacturing to healthcare.
In this work, we focus on unsupervised AD problems whose entire training data are unlabeled and may contain both normal and anomalous samples.
To tackle this problem, we build a robust one-class classification framework via data refinement.
We show that our method outperforms state-of-the-art one-class classification method by 6.3 AUC and 12.5 average precision.
arXiv Detail & Related papers (2021-06-11T01:36:08Z) - Too Much Information Kills Information: A Clustering Perspective [6.375668163098171]
We propose a simple, but novel approach for variance-based k-clustering tasks, including in which is the widely known k-means clustering.
The proposed approach picks a sampling subset from the given dataset and makes decisions based on the data information in the subset only.
With certain assumptions, the resulting clustering is provably good to estimate the optimum of the variance-based objective with high probability.
arXiv Detail & Related papers (2020-09-16T01:54:26Z) - A semi-supervised sparse K-Means algorithm [3.04585143845864]
An unsupervised sparse clustering method can be employed in order to detect the subgroup of features necessary for clustering.
A semi-supervised method can use the labelled data to create constraints and enhance the clustering solution.
We show that the algorithm maintains the high performance of other semi-supervised algorithms and in addition preserves the ability to identify informative from uninformative features.
arXiv Detail & Related papers (2020-03-16T02:05:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.