PL-$k$NN: A Parameterless Nearest Neighbors Classifier
- URL: http://arxiv.org/abs/2209.12647v1
- Date: Mon, 26 Sep 2022 12:52:45 GMT
- Title: PL-$k$NN: A Parameterless Nearest Neighbors Classifier
- Authors: Danilo Samuel Jodas, Leandro Aparecido Passos, Ahsan Adeel, Jo\~ao
Paulo Papa
- Abstract summary: The $k$-Nearest Neighbors is one of the most effective and straightforward models employed in numerous problems.
This paper proposes a $k$-Nearest Neighbors classifier that bypasses the need to define the value of $k$.
- Score: 0.24499092754102875
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Demands for minimum parameter setup in machine learning models are desirable
to avoid time-consuming optimization processes. The $k$-Nearest Neighbors is
one of the most effective and straightforward models employed in numerous
problems. Despite its well-known performance, it requires the value of $k$ for
specific data distribution, thus demanding expensive computational efforts.
This paper proposes a $k$-Nearest Neighbors classifier that bypasses the need
to define the value of $k$. The model computes the $k$ value adaptively
considering the data distribution of the training set. We compared the proposed
model against the standard $k$-Nearest Neighbors classifier and two
parameterless versions from the literature. Experiments over 11 public datasets
confirm the robustness of the proposed approach, for the obtained results were
similar or even better than its counterpart versions.
Related papers
- Adaptive $k$-nearest neighbor classifier based on the local estimation of the shape operator [49.87315310656657]
We introduce a new adaptive $k$-nearest neighbours ($kK$-NN) algorithm that explores the local curvature at a sample to adaptively defining the neighborhood size.
Results on many real-world datasets indicate that the new $kK$-NN algorithm yields superior balanced accuracy compared to the established $k$-NN method.
arXiv Detail & Related papers (2024-09-08T13:08:45Z) - Is $F_1$ Score Suboptimal for Cybersecurity Models? Introducing $C_{score}$, a Cost-Aware Alternative for Model Assessment [1.747623282473278]
False positives and false negatives are not equal and are application dependent.
In cybersecurity applications, the cost of not detecting an attack is very different from marking a benign activity as an attack.
We propose a new cost-aware metric, $C_score$ based on precision and recall.
arXiv Detail & Related papers (2024-07-19T21:01:19Z) - Transfer Q Star: Principled Decoding for LLM Alignment [105.89114186982972]
Transfer $Q*$ estimates the optimal value function for a target reward $r$ through a baseline model.
Our approach significantly reduces the sub-optimality gap observed in prior SoTA methods.
arXiv Detail & Related papers (2024-05-30T21:36:12Z) - Marich: A Query-efficient Distributionally Equivalent Model Extraction
Attack using Public Data [10.377650972462654]
Black-box model extraction attacks can send minimal number of queries from a publicly available dataset to a target ML model through a predictive API.
We create an informative and distributionally equivalent replica of the target using an active sampling-based query selection algorithm, Marich.
Marich extracts models that achieve $sim 60-95%$ of true model's accuracy and uses $sim 1,000 - 8,500$ queries from the publicly available datasets.
arXiv Detail & Related papers (2023-02-16T18:20:27Z) - N-Gram Nearest Neighbor Machine Translation [101.25243884801183]
We propose a novel $n$-gram nearest neighbor retrieval method that is model agnostic and applicable to both Autoregressive Translation(AT) and Non-Autoregressive Translation(NAT) models.
We demonstrate that the proposed method consistently outperforms the token-level method on both AT and NAT models as well as on general as on domain adaptation translation tasks.
arXiv Detail & Related papers (2023-01-30T13:19:19Z) - Bayesian Target-Vector Optimization for Efficient Parameter
Reconstruction [0.0]
We introduce a target-vector optimization scheme that considers all $K$ contributions of the model function and that is specifically suited for parameter reconstruction problems.
It also enables to determine accurate uncertainty estimates with very few observations of the actual model function.
arXiv Detail & Related papers (2022-02-23T15:13:32Z) - Developing and Improving Risk Models using Machine-learning Based
Algorithms [6.245537312562826]
The objective of this study is to develop a good risk model for classifying business delinquency.
The rationale under the analyses is firstly to obtain good base binary classifiers via regularization.
Two model ensembling algorithms including bagging and boosting are performed on the good base classifiers for further model improvement.
arXiv Detail & Related papers (2020-09-09T20:38:00Z) - AutoSimulate: (Quickly) Learning Synthetic Data Generation [70.82315853981838]
We propose an efficient alternative for optimal synthetic data generation based on a novel differentiable approximation of the objective.
We demonstrate that the proposed method finds the optimal data distribution faster (up to $50times$), with significantly reduced training data generation (up to $30times$) and better accuracy ($+8.7%$) on real-world test datasets than previous methods.
arXiv Detail & Related papers (2020-08-16T11:36:11Z) - Ranking a set of objects: a graph based least-square approach [70.7866286425868]
We consider the problem of ranking $N$ objects starting from a set of noisy pairwise comparisons provided by a crowd of equal workers.
We propose a class of non-adaptive ranking algorithms that rely on a least-squares intrinsic optimization criterion for the estimation of qualities.
arXiv Detail & Related papers (2020-02-26T16:19:09Z) - Learning Gaussian Graphical Models via Multiplicative Weights [54.252053139374205]
We adapt an algorithm of Klivans and Meka based on the method of multiplicative weight updates.
The algorithm enjoys a sample complexity bound that is qualitatively similar to others in the literature.
It has a low runtime $O(mp2)$ in the case of $m$ samples and $p$ nodes, and can trivially be implemented in an online manner.
arXiv Detail & Related papers (2020-02-20T10:50:58Z) - Learning the Stein Discrepancy for Training and Evaluating Energy-Based
Models without Sampling [30.406623987492726]
We present a new method for evaluating and training unnormalized density models.
We estimate the Stein discrepancy between the data density $p(x)$ and the model density $q(x)$ defined by a vector function of the data.
This yields a novel goodness-of-fit test which outperforms existing methods on high dimensional data.
arXiv Detail & Related papers (2020-02-13T16:39:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.