A k nearest neighbours classifiers ensemble based on extended
neighbourhood rule and features subsets
- URL: http://arxiv.org/abs/2205.15111v1
- Date: Mon, 30 May 2022 13:57:32 GMT
- Title: A k nearest neighbours classifiers ensemble based on extended
neighbourhood rule and features subsets
- Authors: Amjad Ali, Muhammad Hamraz, Naz Gul, Dost Muhammad Khan, Zardad Khan,
Saeed Aldahmani
- Abstract summary: kNN based ensemble methods minimise the effect of outliers by identifying a set of data points in the given feature space that are nearest to an unseen observation.
This paper proposes a k nearest neighbour ensemble where the neighbours are determined in k steps.
- Score: 0.4709844746265484
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: kNN based ensemble methods minimise the effect of outliers by identifying a
set of data points in the given feature space that are nearest to an unseen
observation in order to predict its response by using majority voting. The
ordinary ensembles based on kNN find out the k nearest observations in a region
(bounded by a sphere) based on a predefined value of k. This scenario, however,
might not work in situations when the test observation follows the pattern of
the closest data points with the same class that lie on a certain path not
contained in the given sphere. This paper proposes a k nearest neighbour
ensemble where the neighbours are determined in k steps. Starting from the
first nearest observation of the test point, the algorithm identifies a single
observation that is closest to the observation at the previous step. At each
base learner in the ensemble, this search is extended to k steps on a random
bootstrap sample with a random subset of features selected from the feature
space. The final predicted class of the test point is determined by using a
majority vote in the predicted classes given by all base models. This new
ensemble method is applied on 17 benchmark datasets and compared with other
classical methods, including kNN based models, in terms of classification
accuracy, kappa and Brier score as performance metrics. Boxplots are also
utilised to illustrate the difference in the results given by the proposed and
other state-of-the-art methods. The proposed method outperformed the rest of
the classical methods in the majority of cases. The paper gives a detailed
simulation study for further assessment.
Related papers
- Class-Conditional Conformal Prediction with Many Classes [60.8189977620604]
We propose a method called clustered conformal prediction that clusters together classes having "similar" conformal scores.
We find that clustered conformal typically outperforms existing methods in terms of class-conditional coverage and set size metrics.
arXiv Detail & Related papers (2023-06-15T17:59:02Z) - Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation.
Specifically, we construct distance matrix between data points by Butterworth filter.
To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z) - A Random Projection k Nearest Neighbours Ensemble for Classification via
Extended Neighbourhood Rule [0.5052937880533719]
Ensembles based on k nearest neighbours (kNN) combine a large number of base learners.
RPExNRule ensemble is proposed where bootstrap samples from the given training data are randomly projected into lower dimensions.
arXiv Detail & Related papers (2023-03-21T21:58:59Z) - Optimal Extended Neighbourhood Rule $k$ Nearest Neighbours Ensemble [1.8843687952462742]
A new optimal extended neighborhood rule based ensemble method is proposed in this paper.
The ensemble is compared with state-of-the-art methods on 17 benchmark datasets using accuracy, Cohen's kappa, and Brier score (BS)
arXiv Detail & Related papers (2022-11-21T09:13:54Z) - An enhanced method of initial cluster center selection for K-means
algorithm [0.0]
We propose a novel approach to improve initial cluster selection for K-means algorithm.
The Convex Hull algorithm facilitates the computing of the first two centroids and the remaining ones are selected according to the distance from previously selected centers.
We obtained only 7.33%, 7.90%, and 0% clustering error in Iris, Letter, and Ruspini data respectively.
arXiv Detail & Related papers (2022-10-18T00:58:50Z) - Gradient Based Clustering [72.15857783681658]
We propose a general approach for distance based clustering, using the gradient of the cost function that measures clustering quality.
The approach is an iterative two step procedure (alternating between cluster assignment and cluster center updates) and is applicable to a wide range of functions.
arXiv Detail & Related papers (2022-02-01T19:31:15Z) - Estimating leverage scores via rank revealing methods and randomization [50.591267188664666]
We study algorithms for estimating the statistical leverage scores of rectangular dense or sparse matrices of arbitrary rank.
Our approach is based on combining rank revealing methods with compositions of dense and sparse randomized dimensionality reduction transforms.
arXiv Detail & Related papers (2021-05-23T19:21:55Z) - Adversarial Examples for $k$-Nearest Neighbor Classifiers Based on
Higher-Order Voronoi Diagrams [69.4411417775822]
Adversarial examples are a widely studied phenomenon in machine learning models.
We propose an algorithm for evaluating the adversarial robustness of $k$-nearest neighbor classification.
arXiv Detail & Related papers (2020-11-19T08:49:10Z) - K-Nearest Neighbour and Support Vector Machine Hybrid Classification [0.0]
The technique consists of using K-Nearest Neighbour Classification for test samples satisfying a proximity condition.
For every separated test sample, a Support Vector Machine is trained on the sifted training set patterns associated with it, and classification for the test sample is done.
arXiv Detail & Related papers (2020-06-28T15:26:56Z) - Clustering Binary Data by Application of Combinatorial Optimization
Heuristics [52.77024349608834]
We study clustering methods for binary data, first defining aggregation criteria that measure the compactness of clusters.
Five new and original methods are introduced, using neighborhoods and population behavior optimization metaheuristics.
From a set of 16 data tables generated by a quasi-Monte Carlo experiment, a comparison is performed for one of the aggregations using L1 dissimilarity, with hierarchical clustering, and a version of k-means: partitioning around medoids or PAM.
arXiv Detail & Related papers (2020-01-06T23:33:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.