Surprisal Driven $k$-NN for Robust and Interpretable Nonparametric
Learning
- URL: http://arxiv.org/abs/2311.10246v2
- Date: Fri, 2 Feb 2024 21:48:18 GMT
- Title: Surprisal Driven $k$-NN for Robust and Interpretable Nonparametric
Learning
- Authors: Amartya Banerjee, Christopher J. Hazard, Jacob Beel, Cade Mack, Jack
Xia, Michael Resnick, Will Goddin
- Abstract summary: We shed new light on the traditional nearest neighbors algorithm from the perspective of information theory.
We propose a robust and interpretable framework for tasks such as classification, regression, density estimation, and anomaly detection using a single model.
Our work showcases the architecture's versatility by achieving state-of-the-art results in classification and anomaly detection.
- Score: 1.4293924404819704
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Nonparametric learning is a fundamental concept in machine learning that aims
to capture complex patterns and relationships in data without making strong
assumptions about the underlying data distribution. Owing to simplicity and
familiarity, one of the most well-known algorithms under this paradigm is the
$k$-nearest neighbors ($k$-NN) algorithm. Driven by the usage of machine
learning in safety-critical applications, in this work, we shed new light on
the traditional nearest neighbors algorithm from the perspective of information
theory and propose a robust and interpretable framework for tasks such as
classification, regression, density estimation, and anomaly detection using a
single model. We can determine data point weights as well as feature
contributions by calculating the conditional entropy for adding a feature
without the need for explicit model training. This allows us to compute feature
contributions by providing detailed data point influence weights with perfect
attribution and can be used to query counterfactuals. Instead of using a
traditional distance measure which needs to be scaled and contextualized, we
use a novel formulation of $\textit{surprisal}$ (amount of information required
to explain the difference between the observed and expected result). Finally,
our work showcases the architecture's versatility by achieving state-of-the-art
results in classification and anomaly detection, while also attaining
competitive results for regression across a statistically significant number of
datasets.
Related papers
- Tackling Computational Heterogeneity in FL: A Few Theoretical Insights [68.8204255655161]
We introduce and analyse a novel aggregation framework that allows for formalizing and tackling computational heterogeneous data.
Proposed aggregation algorithms are extensively analyzed from a theoretical, and an experimental prospective.
arXiv Detail & Related papers (2023-07-12T16:28:21Z) - An Entropy-Based Model for Hierarchical Learning [3.1473798197405944]
A common feature among real-world datasets is that data domains are multiscale.
We propose a learning model that exploits this multiscale data structure.
The hierarchical learning model is inspired by the logical and progressive easy-to-hard learning mechanism of human beings.
arXiv Detail & Related papers (2022-12-30T13:14:46Z) - Less is More: Rethinking Few-Shot Learning and Recurrent Neural Nets [2.824895388993495]
We provide theoretical guarantees for reliable learning under the information-theoretic AEP.
We then focus on a highly efficient recurrent neural net (RNN) framework and propose a reduced-entropy algorithm for few-shot learning.
Our experimental results demonstrate significant potential for improving learning models' sample efficiency, generalization, and time complexity.
arXiv Detail & Related papers (2022-09-28T17:33:11Z) - HyperImpute: Generalized Iterative Imputation with Automatic Model
Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models.
We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z) - Towards Open-World Feature Extrapolation: An Inductive Graph Learning
Approach [80.8446673089281]
We propose a new learning paradigm with graph representation and learning.
Our framework contains two modules: 1) a backbone network (e.g., feedforward neural nets) as a lower model takes features as input and outputs predicted labels; 2) a graph neural network as an upper model learns to extrapolate embeddings for new features via message passing over a feature-data graph built from observed data.
arXiv Detail & Related papers (2021-10-09T09:02:45Z) - Bounding Information Leakage in Machine Learning [26.64770573405079]
This paper investigates fundamental bounds on information leakage.
We identify and bound the success rate of the worst-case membership inference attack.
We derive bounds on the mutual information between the sensitive attributes and model parameters.
arXiv Detail & Related papers (2021-05-09T08:49:14Z) - ALT-MAS: A Data-Efficient Framework for Active Testing of Machine
Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data.
The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z) - A bandit-learning approach to multifidelity approximation [7.960229223744695]
Multifidelity approximation is an important technique in scientific computation and simulation.
We introduce a bandit-learning approach for leveraging data of varying fidelities to achieve precise estimates.
arXiv Detail & Related papers (2021-03-29T05:29:35Z) - Estimating informativeness of samples with Smooth Unique Information [108.25192785062367]
We measure how much a sample informs the final weights and how much it informs the function computed by the weights.
We give efficient approximations of these quantities using a linearized network.
We apply these measures to several problems, such as dataset summarization.
arXiv Detail & Related papers (2021-01-17T10:29:29Z) - A Trainable Optimal Transport Embedding for Feature Aggregation and its
Relationship to Attention [96.77554122595578]
We introduce a parametrized representation of fixed size, which embeds and then aggregates elements from a given input set according to the optimal transport plan between the set and a trainable reference.
Our approach scales to large datasets and allows end-to-end training of the reference, while also providing a simple unsupervised learning mechanism with small computational cost.
arXiv Detail & Related papers (2020-06-22T08:35:58Z) - Hierarchical Gaussian Process Priors for Bayesian Neural Network Weights [16.538973310830414]
A desirable class of priors would represent weights compactly, capture correlations between weights, and allow inclusion of prior knowledge.
This paper introduces two innovations: (i) a process-based hierarchical model for network weights based on unit embeddings that can flexibly encode correlated weight structures, and (ii) input-dependent versions of these weight priors that can provide convenient ways to regularize the function space.
We show these models provide desirable test-time uncertainty estimates on out-of-distribution data, demonstrate cases of modeling inductive biases for neural networks with kernels, and demonstrate competitive predictive performance on an active learning benchmark
arXiv Detail & Related papers (2020-02-10T07:19:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.