Overlapping oriented imbalanced ensemble learning method based on
projective clustering and stagewise hybrid sampling
- URL: http://arxiv.org/abs/2212.03182v1
- Date: Wed, 30 Nov 2022 01:49:06 GMT
- Title: Overlapping oriented imbalanced ensemble learning method based on
projective clustering and stagewise hybrid sampling
- Authors: Fan Li, Bo Wang, Pin Wang, Yongming Li
- Abstract summary: This paper proposes an ensemble learning algorithm based on dual clustering and stage-wise hybrid sampling (DCSHS)
The major advantage of our algorithm is that it can exploit the intersectionality of the CCS to realize the soft elimination of overlapping majority samples.
- Score: 22.32930261633615
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The challenge of imbalanced learning lies not only in class imbalance
problem, but also in the class overlapping problem which is complex. However,
most of the existing algorithms mainly focus on the former. The limitation
prevents the existing methods from breaking through. To address this
limitation, this paper proposes an ensemble learning algorithm based on dual
clustering and stage-wise hybrid sampling (DCSHS). The DCSHS has three parts.
Firstly, we design a projection clustering combination framework (PCC) guided
by Davies-Bouldin clustering effectiveness index (DBI), which is used to obtain
high-quality clusters and combine them to obtain a set of cross-complete
subsets (CCS) with balanced class and low overlapping. Secondly, according to
the characteristics of subset classes, a stage-wise hybrid sampling algorithm
is designed to realize the de-overlapping and balancing of subsets. Finally, a
projective clustering transfer mapping mechanism (CTM) is constructed for all
processed subsets by means of transfer learning, thereby reducing class
overlapping and explore structure information of samples. The major advantage
of our algorithm is that it can exploit the intersectionality of the CCS to
realize the soft elimination of overlapping majority samples, and learn as much
information of overlapping samples as possible, thereby enhancing the class
overlapping while class balancing. In the experimental section, more than 30
public datasets and over ten representative algorithms are chosen for
verification. The experimental results show that the DCSHS is significantly
best in terms of various evaluation criteria.
Related papers
- Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks.
We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z) - GCC: Generative Calibration Clustering [55.44944397168619]
We propose a novel Generative Clustering (GCC) method to incorporate feature learning and augmentation into clustering procedure.
First, we develop a discrimirative feature alignment mechanism to discover intrinsic relationship across real and generated samples.
Second, we design a self-supervised metric learning to generate more reliable cluster assignment.
arXiv Detail & Related papers (2024-04-14T01:51:11Z) - Deep Embedding Clustering Driven by Sample Stability [16.53706617383543]
We propose a deep embedding clustering algorithm driven by sample stability (DECS)
Specifically, we start by constructing the initial feature space with an autoencoder and then learn the cluster-oriented embedding feature constrained by sample stability.
The experimental results on five datasets illustrate that the proposed method achieves superior performance compared to state-of-the-art clustering approaches.
arXiv Detail & Related papers (2024-01-29T09:19:49Z) - Stable Cluster Discrimination for Deep Clustering [7.175082696240088]
Deep clustering can optimize representations of instances (i.e., representation learning) and explore the inherent data distribution.
The coupled objective implies a trivial solution that all instances collapse to the uniform features.
In this work, we first show that the prevalent discrimination task in supervised learning is unstable for one-stage clustering.
A novel stable cluster discrimination (SeCu) task is proposed and a new hardness-aware clustering criterion can be obtained accordingly.
arXiv Detail & Related papers (2023-11-24T06:43:26Z) - Envelope Imbalance Learning Algorithm based on Multilayer Fuzzy C-means
Clustering and Minimum Interlayer discrepancy [14.339674126923903]
This paper proposes a deep instance envelope network-based imbalanced learning algorithm with the multilayer fuzzy c-means (MlFCM) and a minimum interlayer discrepancy mechanism based on the maximum mean discrepancy (MIDMD)
This algorithm can guarantee high quality balanced instances using a deep instance envelope network in the absence of prior knowledge.
arXiv Detail & Related papers (2021-11-02T04:59:57Z) - Transductive Few-Shot Learning: Clustering is All You Need? [31.21306826132773]
We investigate a general formulation for transive few-shot learning, which integrates prototype-based objectives.
We find that our method yields competitive performances, in term of accuracy and optimization, while scaling up to large problems.
Surprisingly, we find that our general model already achieve competitive performances in comparison to the state-of-the-art learning.
arXiv Detail & Related papers (2021-06-16T16:14:01Z) - You Never Cluster Alone [150.94921340034688]
We extend the mainstream contrastive learning paradigm to a cluster-level scheme, where all the data subjected to the same cluster contribute to a unified representation.
We define a set of categorical variables as clustering assignment confidence, which links the instance-level learning track with the cluster-level one.
By reparametrizing the assignment variables, TCC is trained end-to-end, requiring no alternating steps.
arXiv Detail & Related papers (2021-06-03T14:59:59Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - Progressive Cluster Purification for Unsupervised Feature Learning [48.87365358296371]
In unsupervised feature learning, sample specificity based methods ignore the inter-class information.
We propose a novel clustering based method, which excludes class inconsistent samples during progressive cluster formation.
Our approach, referred to as Progressive Cluster Purification (PCP), implements progressive clustering by gradually reducing the number of clusters during training.
arXiv Detail & Related papers (2020-07-06T08:11:03Z) - Generalized Zero-Shot Learning Via Over-Complete Distribution [79.5140590952889]
We propose to generate an Over-Complete Distribution (OCD) using Conditional Variational Autoencoder (CVAE) of both seen and unseen classes.
The effectiveness of the framework is evaluated using both Zero-Shot Learning and Generalized Zero-Shot Learning protocols.
arXiv Detail & Related papers (2020-04-01T19:05:28Z) - A Classification-Based Approach to Semi-Supervised Clustering with
Pairwise Constraints [5.639904484784126]
We introduce a network framework for semi-supervised clustering with pairwise constraints.
In contrast to existing approaches, we decompose SSC into two simpler classification tasks/stages.
The proposed approach, S3C2, is motivated by the observation that binary classification is usually easier than multi-class clustering.
arXiv Detail & Related papers (2020-01-18T20:13:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.