Certifiable Robustness for Nearest Neighbor Classifiers
- URL: http://arxiv.org/abs/2201.04770v1
- Date: Thu, 13 Jan 2022 02:55:10 GMT
- Title: Certifiable Robustness for Nearest Neighbor Classifiers
- Authors: Austen Z. Fan and Paraschos Koutris
- Abstract summary: We study the complexity of certifying for a simple but widely deployed classification algorithm, $k$-Nearest Neighbors ($k$-NN)
Our main focus is on inconsistent datasets when constraints are functional dependencies (FDs)
We exhibit a similar counting version of the problem, where the goal is to count the number of possible worlds that predict a certain label.
- Score: 6.487663563916903
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: ML models are typically trained using large datasets of high quality.
However, training datasets often contain inconsistent or incomplete data. To
tackle this issue, one solution is to develop algorithms that can check whether
a prediction of a model is certifiably robust. Given a learning algorithm that
produces a classifier and given an example at test time, a classification
outcome is certifiably robust if it is predicted by every model trained across
all possible worlds (repairs) of the uncertain (inconsistent) dataset. This
notion of robustness falls naturally under the framework of certain answers. In
this paper, we study the complexity of certifying robustness for a simple but
widely deployed classification algorithm, $k$-Nearest Neighbors ($k$-NN). Our
main focus is on inconsistent datasets when the integrity constraints are
functional dependencies (FDs). For this setting, we establish a dichotomy in
the complexity of certifying robustness w.r.t. the set of FDs: the problem
either admits a polynomial time algorithm, or it is coNP-hard. Additionally, we
exhibit a similar dichotomy for the counting version of the problem, where the
goal is to count the number of possible worlds that predict a certain label. As
a byproduct of our study, we also establish the complexity of a problem related
to finding an optimal subset repair that may be of independent interest.
Related papers
- Who Should Predict? Exact Algorithms For Learning to Defer to Humans [40.22768241509553]
We show that prior approaches can fail to find a human-AI system with low misclassification error.
We give a mixed-integer-linear-programming (MILP) formulation that can optimally solve the problem in the linear setting.
We provide a novel surrogate loss function that is realizable-consistent and performs well empirically.
arXiv Detail & Related papers (2023-01-15T21:57:36Z) - Learning to Bound Counterfactual Inference in Structural Causal Models
from Observational and Randomised Data [64.96984404868411]
We derive a likelihood characterisation for the overall data that leads us to extend a previous EM-based algorithm.
The new algorithm learns to approximate the (unidentifiability) region of model parameters from such mixed data sources.
It delivers interval approximations to counterfactual results, which collapse to points in the identifiable case.
arXiv Detail & Related papers (2022-12-06T12:42:11Z) - Ensemble Methods for Robust Support Vector Machines using Integer
Programming [0.0]
We study binary classification problems where we assume that our training data is subject to uncertainty.
To tackle this issue in the field of robust machine learning the aim is to develop models which are robust against small perturbations in the training data.
arXiv Detail & Related papers (2022-03-03T10:03:54Z) - Optimal Clustering with Bandit Feedback [57.672609011609886]
This paper considers the problem of online clustering with bandit feedback.
It includes a novel stopping rule for sequential testing that circumvents the need to solve any NP-hard weighted clustering problem as its subroutines.
We show through extensive simulations on synthetic and real-world datasets that BOC's performance matches the lower boundally, and significantly outperforms a non-adaptive baseline algorithm.
arXiv Detail & Related papers (2022-02-09T06:05:05Z) - Generalization of Neural Combinatorial Solvers Through the Lens of
Adversarial Robustness [68.97830259849086]
Most datasets only capture a simpler subproblem and likely suffer from spurious features.
We study adversarial robustness - a local generalization property - to reveal hard, model-specific instances and spurious features.
Unlike in other applications, where perturbation models are designed around subjective notions of imperceptibility, our perturbation models are efficient and sound.
Surprisingly, with such perturbations, a sufficiently expressive neural solver does not suffer from the limitations of the accuracy-robustness trade-off common in supervised learning.
arXiv Detail & Related papers (2021-10-21T07:28:11Z) - Fuzzy Clustering with Similarity Queries [56.96625809888241]
The fuzzy or soft objective is a popular generalization of the well-known $k$-means problem.
We show that by making few queries, the problem becomes easier to solve.
arXiv Detail & Related papers (2021-06-04T02:32:26Z) - Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature
Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization.
We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z) - Online Model Selection for Reinforcement Learning with Function
Approximation [50.008542459050155]
We present a meta-algorithm that adapts to the optimal complexity with $tildeO(L5/6 T2/3)$ regret.
We also show that the meta-algorithm automatically admits significantly improved instance-dependent regret bounds.
arXiv Detail & Related papers (2020-11-19T10:00:54Z) - Robust Estimation of Tree Structured Ising Models [20.224160348675422]
We consider the task of learning Ising models when the signs of different random variables are flipped independently with possibly unequal, unknown probabilities.
We first prove that this problem is unidentifiable, however, this unidentifiability is limited to a small equivalence class of trees formed by leaf nodes exchanging positions with their neighbors.
arXiv Detail & Related papers (2020-06-10T01:32:45Z) - Distributionally Robust Weighted $k$-Nearest Neighbors [21.537952410507483]
Learning a robust classifier from a few samples remains a key challenge in machine learning.
In this paper, we study a minimax distributionally robust formulation of weighted $k$-nearest neighbors.
We develop an algorithm, textttDr.k-NN, that efficiently solves this functional optimization problem.
arXiv Detail & Related papers (2020-06-07T00:34:33Z) - Establishing strong imputation performance of a denoising autoencoder in
a wide range of missing data problems [0.0]
We develop a consistent framework for both training and imputation.
We benchmarked the results against state-of-the-art imputation methods.
The developed autoencoder obtained the smallest error for all ranges of initial data corruption.
arXiv Detail & Related papers (2020-04-06T12:00:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.