Envelope Imbalance Learning Algorithm based on Multilayer Fuzzy C-means
Clustering and Minimum Interlayer discrepancy
- URL: http://arxiv.org/abs/2111.01371v1
- Date: Tue, 2 Nov 2021 04:59:57 GMT
- Title: Envelope Imbalance Learning Algorithm based on Multilayer Fuzzy C-means
Clustering and Minimum Interlayer discrepancy
- Authors: Fan Li, Xiaoheng Zhang, Pin Wang, Yongming Li
- Abstract summary: This paper proposes a deep instance envelope network-based imbalanced learning algorithm with the multilayer fuzzy c-means (MlFCM) and a minimum interlayer discrepancy mechanism based on the maximum mean discrepancy (MIDMD)
This algorithm can guarantee high quality balanced instances using a deep instance envelope network in the absence of prior knowledge.
- Score: 14.339674126923903
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Imbalanced learning is important and challenging since the problem of the
classification of imbalanced datasets is prevalent in machine learning and data
mining fields. Sampling approaches are proposed to address this issue, and
cluster-based oversampling methods have shown great potential as they aim to
simultaneously tackle between-class and within-class imbalance issues. However,
all existing clustering methods are based on a one-time approach. Due to the
lack of a priori knowledge, improper setting of the number of clusters often
exists, which leads to poor clustering performance. Besides, the existing
methods are likely to generate noisy instances. To solve these problems, this
paper proposes a deep instance envelope network-based imbalanced learning
algorithm with the multilayer fuzzy c-means (MlFCM) and a minimum interlayer
discrepancy mechanism based on the maximum mean discrepancy (MIDMD). This
algorithm can guarantee high quality balanced instances using a deep instance
envelope network in the absence of prior knowledge. In the experimental
section, thirty-three popular public datasets are used for verification, and
over ten representative algorithms are used for comparison. The experimental
results show that the proposed approach significantly outperforms other popular
methods.
Related papers
- Robust and Automatic Data Clustering: Dirichlet Process meets
Median-of-Means [18.3248037914529]
We present an efficient and automatic clustering technique by integrating the principles of model-based and centroid-based methodologies.
Statistical guarantees on the upper bound of clustering error suggest the advantages of our proposed method over existing state-of-the-art clustering algorithms.
arXiv Detail & Related papers (2023-11-26T19:01:15Z) - Tackling Diverse Minorities in Imbalanced Classification [80.78227787608714]
Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers.
We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes.
We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
arXiv Detail & Related papers (2023-08-28T18:48:34Z) - Research on Efficient Fuzzy Clustering Method Based on Local Fuzzy
Granular balls [67.33923111887933]
In this paper, the data is fuzzy iterated using granular-balls, and the membership degree of data only considers the two granular-balls where it is located.
The formed fuzzy granular-balls set can use more processing methods in the face of different data scenarios.
arXiv Detail & Related papers (2023-03-07T01:52:55Z) - Overlapping oriented imbalanced ensemble learning method based on
projective clustering and stagewise hybrid sampling [22.32930261633615]
This paper proposes an ensemble learning algorithm based on dual clustering and stage-wise hybrid sampling (DCSHS)
The major advantage of our algorithm is that it can exploit the intersectionality of the CCS to realize the soft elimination of overlapping majority samples.
arXiv Detail & Related papers (2022-11-30T01:49:06Z) - An Instance Selection Algorithm for Big Data in High imbalanced datasets
based on LSH [0.0]
Training Machine Learning models in real contexts often deals with big data sets and imbalance samples where the class of interest is unrepresented.
This work proposes three new methods for instance selection (IS) to be able to deal with large and imbalanced data sets.
Algorithms were developed in the Apache Spark framework, guaranteeing their scalability.
arXiv Detail & Related papers (2022-10-09T17:38:41Z) - Rethinking Clustering-Based Pseudo-Labeling for Unsupervised
Meta-Learning [146.11600461034746]
Method for unsupervised meta-learning, CACTUs, is a clustering-based approach with pseudo-labeling.
This approach is model-agnostic and can be combined with supervised algorithms to learn from unlabeled data.
We prove that the core reason for this is lack of a clustering-friendly property in the embedding space.
arXiv Detail & Related papers (2022-09-27T19:04:36Z) - A One-shot Framework for Distributed Clustered Learning in Heterogeneous
Environments [54.172993875654015]
The paper proposes a family of communication efficient methods for distributed learning in heterogeneous environments.
One-shot approach, based on local computations at the users and a clustering based aggregation step at the server is shown to provide strong learning guarantees.
For strongly convex problems it is shown that, as long as the number of data points per user is above a threshold, the proposed approach achieves order-optimal mean-squared error rates in terms of the sample size.
arXiv Detail & Related papers (2022-09-22T09:04:10Z) - Learning Statistical Representation with Joint Deep Embedded Clustering [2.1267423178232407]
StatDEC is an unsupervised framework for joint statistical representation learning and clustering.
Our experiments show that using these representations, one can considerably improve results on imbalanced image clustering across a variety of image datasets.
arXiv Detail & Related papers (2021-09-11T09:26:52Z) - Unsupervised Clustered Federated Learning in Complex Multi-source
Acoustic Environments [75.8001929811943]
We introduce a realistic and challenging, multi-source and multi-room acoustic environment.
We present an improved clustering control strategy that takes into account the variability of the acoustic scene.
The proposed approach is optimized using clustering-based measures and validated via a network-wide classification task.
arXiv Detail & Related papers (2021-06-07T14:51:39Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - Combined Cleaning and Resampling Algorithm for Multi-Class Imbalanced
Data with Label Noise [11.868507571027626]
In this paper, we propose a novel oversampling technique, a Multi-Class Combined Cleaning and Resampling algorithm.
The proposed method utilizes an energy-based approach to modeling the regions suitable for oversampling, less affected by small disjuncts and outliers than SMOTE.
It combines it with a simultaneous cleaning operation, the aim of which is to reduce the effect of overlapping class distributions on the performance of the learning algorithms.
arXiv Detail & Related papers (2020-04-07T13:59:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.