Related papers: Envelope Imbalance Learning Algorithm based on Multilayer Fuzzy C-means Clustering and Minimum Interlayer discrepancy

Envelope Imbalance Learning Algorithm based on Multilayer Fuzzy C-means Clustering and Minimum Interlayer discrepancy

URL: http://arxiv.org/abs/2111.01371v1
Date: Tue, 2 Nov 2021 04:59:57 GMT
Title: Envelope Imbalance Learning Algorithm based on Multilayer Fuzzy C-means Clustering and Minimum Interlayer discrepancy
Authors: Fan Li, Xiaoheng Zhang, Pin Wang, Yongming Li
Abstract summary: This paper proposes a deep instance envelope network-based imbalanced learning algorithm with the multilayer fuzzy c-means (MlFCM) and a minimum interlayer discrepancy mechanism based on the maximum mean discrepancy (MIDMD) This algorithm can guarantee high quality balanced instances using a deep instance envelope network in the absence of prior knowledge.
Score: 14.339674126923903
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Imbalanced learning is important and challenging since the problem of the classification of imbalanced datasets is prevalent in machine learning and data mining fields. Sampling approaches are proposed to address this issue, and cluster-based oversampling methods have shown great potential as they aim to simultaneously tackle between-class and within-class imbalance issues. However, all existing clustering methods are based on a one-time approach. Due to the lack of a priori knowledge, improper setting of the number of clusters often exists, which leads to poor clustering performance. Besides, the existing methods are likely to generate noisy instances. To solve these problems, this paper proposes a deep instance envelope network-based imbalanced learning algorithm with the multilayer fuzzy c-means (MlFCM) and a minimum interlayer discrepancy mechanism based on the maximum mean discrepancy (MIDMD). This algorithm can guarantee high quality balanced instances using a deep instance envelope network in the absence of prior knowledge. In the experimental section, thirty-three popular public datasets are used for verification, and over ten representative algorithms are used for comparison. The experimental results show that the proposed approach significantly outperforms other popular methods.

Related papers

Average Sensitivity of Hierarchical $k$-Median Clustering [9.107341310040155]
We focus on the hierarchical $k$ -median clustering problem, which bridges hierarchical and centroid-based clustering.<n>We propose an efficient algorithm for hierarchical $k$-median clustering and theoretically prove its low average sensitivity and high clustering quality.
arXiv Detail & Related papers (2025-07-14T14:02:31Z)
Unbiased Max-Min Embedding Classification for Transductive Few-Shot Learning: Clustering and Classification Are All You Need [83.10178754323955]
Few-shot learning enables models to generalize from only a few labeled examples. We propose the Unbiased Max-Min Embedding Classification (UMMEC) Method, which addresses the key challenges in few-shot learning. Our method significantly improves classification performance with minimal labeled data, advancing the state-of-the-art in annotatedL.
arXiv Detail & Related papers (2025-03-28T07:23:07Z)
Robust and Automatic Data Clustering: Dirichlet Process meets Median-of-Means [18.3248037914529]
We present an efficient and automatic clustering technique by integrating the principles of model-based and centroid-based methodologies. Statistical guarantees on the upper bound of clustering error suggest the advantages of our proposed method over existing state-of-the-art clustering algorithms.
arXiv Detail & Related papers (2023-11-26T19:01:15Z)
Tackling Diverse Minorities in Imbalanced Classification [80.78227787608714]
Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers. We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes. We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
arXiv Detail & Related papers (2023-08-28T18:48:34Z)
Research on Efficient Fuzzy Clustering Method Based on Local Fuzzy Granular balls [67.33923111887933]
In this paper, the data is fuzzy iterated using granular-balls, and the membership degree of data only considers the two granular-balls where it is located. The formed fuzzy granular-balls set can use more processing methods in the face of different data scenarios.
arXiv Detail & Related papers (2023-03-07T01:52:55Z)
Overlapping oriented imbalanced ensemble learning method based on projective clustering and stagewise hybrid sampling [22.32930261633615]
This paper proposes an ensemble learning algorithm based on dual clustering and stage-wise hybrid sampling (DCSHS) The major advantage of our algorithm is that it can exploit the intersectionality of the CCS to realize the soft elimination of overlapping majority samples.
arXiv Detail & Related papers (2022-11-30T01:49:06Z)
An Instance Selection Algorithm for Big Data in High imbalanced datasets based on LSH [0.0]
Training Machine Learning models in real contexts often deals with big data sets and imbalance samples where the class of interest is unrepresented. This work proposes three new methods for instance selection (IS) to be able to deal with large and imbalanced data sets. Algorithms were developed in the Apache Spark framework, guaranteeing their scalability.
arXiv Detail & Related papers (2022-10-09T17:38:41Z)
Rethinking Clustering-Based Pseudo-Labeling for Unsupervised Meta-Learning [146.11600461034746]
Method for unsupervised meta-learning, CACTUs, is a clustering-based approach with pseudo-labeling. This approach is model-agnostic and can be combined with supervised algorithms to learn from unlabeled data. We prove that the core reason for this is lack of a clustering-friendly property in the embedding space.
arXiv Detail & Related papers (2022-09-27T19:04:36Z)
A One-shot Framework for Distributed Clustered Learning in Heterogeneous Environments [54.172993875654015]
The paper proposes a family of communication efficient methods for distributed learning in heterogeneous environments. One-shot approach, based on local computations at the users and a clustering based aggregation step at the server is shown to provide strong learning guarantees. For strongly convex problems it is shown that, as long as the number of data points per user is above a threshold, the proposed approach achieves order-optimal mean-squared error rates in terms of the sample size.
arXiv Detail & Related papers (2022-09-22T09:04:10Z)
Learning Statistical Representation with Joint Deep Embedded Clustering [2.1267423178232407]
StatDEC is an unsupervised framework for joint statistical representation learning and clustering. Our experiments show that using these representations, one can considerably improve results on imbalanced image clustering across a variety of image datasets.
arXiv Detail & Related papers (2021-09-11T09:26:52Z)
Unsupervised Clustered Federated Learning in Complex Multi-source Acoustic Environments [75.8001929811943]
We introduce a realistic and challenging, multi-source and multi-room acoustic environment. We present an improved clustering control strategy that takes into account the variability of the acoustic scene. The proposed approach is optimized using clustering-based measures and validated via a network-wide classification task.
arXiv Detail & Related papers (2021-06-07T14:51:39Z)
Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed. We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z)
Combined Cleaning and Resampling Algorithm for Multi-Class Imbalanced Data with Label Noise [11.868507571027626]
In this paper, we propose a novel oversampling technique, a Multi-Class Combined Cleaning and Resampling algorithm. The proposed method utilizes an energy-based approach to modeling the regions suitable for oversampling, less affected by small disjuncts and outliers than SMOTE. It combines it with a simultaneous cleaning operation, the aim of which is to reduce the effect of overlapping class distributions on the performance of the learning algorithms.
arXiv Detail & Related papers (2020-04-07T13:59:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.