Using Enriched Category Theory to Construct the Nearest Neighbour
Classification Algorithm
- URL: http://arxiv.org/abs/2312.16529v1
- Date: Wed, 27 Dec 2023 11:20:03 GMT
- Title: Using Enriched Category Theory to Construct the Nearest Neighbour
Classification Algorithm
- Authors: Matthew Pugh, Jo Grundy, Corina Cirstea, Nick Harris
- Abstract summary: This paper is the first to construct and motivate a Machine Learning algorithm solely with Enriched Category Theory.
A series of reasonable assumptions about a dataset lead to the construction of the Nearest Neighbours Algorithm.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Exploring whether Enriched Category Theory could provide the foundation of an
alternative approach to Machine Learning. This paper is the first to construct
and motivate a Machine Learning algorithm solely with Enriched Category Theory.
In order to supplement evidence that Category Theory can be used to motivate
robust and explainable algorithms, it is shown that a series of reasonable
assumptions about a dataset lead to the construction of the Nearest Neighbours
Algorithm. In particular, as an extension of the original dataset using
profunctors in the category of Lawvere metric spaces. This leads to a
definition of an Enriched Nearest Neighbours Algorithm, which consequently also
produces an enriched form of the Voronoi diagram. This paper is intended to be
accessible without any knowledge of Category Theory
Related papers
- Bidirectional Logits Tree: Pursuing Granularity Reconcilement in Fine-Grained Classification [89.20477310885731]
This paper addresses the challenge of Granularity Competition in fine-grained classification tasks.
Existing approaches typically develop independent hierarchy-aware models based on shared features extracted from a common base encoder.
We propose a novel framework called the Bidirectional Logits Tree (BiLT) for Granularity Reconcilement.
arXiv Detail & Related papers (2024-12-17T10:42:19Z) - Learn to Categorize or Categorize to Learn? Self-Coding for Generalized
Category Discovery [49.1865089933055]
We propose a novel, efficient and self-supervised method capable of discovering previously unknown categories at test time.
A salient feature of our approach is the assignment of minimum length category codes to individual data instances.
Experimental evaluations, bolstered by state-of-the-art benchmark comparisons, testify to the efficacy of our solution.
arXiv Detail & Related papers (2023-10-30T17:45:32Z) - Hierarchical clustering with dot products recovers hidden tree structure [53.68551192799585]
In this paper we offer a new perspective on the well established agglomerative clustering algorithm, focusing on recovery of hierarchical structure.
We recommend a simple variant of the standard algorithm, in which clusters are merged by maximum average dot product and not, for example, by minimum distance or within-cluster variance.
We demonstrate that the tree output by this algorithm provides a bona fide estimate of generative hierarchical structure in data, under a generic probabilistic graphical model.
arXiv Detail & Related papers (2023-05-24T11:05:12Z) - Differentially-Private Hierarchical Clustering with Provable
Approximation Guarantees [79.59010418610625]
We study differentially private approximation algorithms for hierarchical clustering.
We show strong lower bounds for the problem: that any $epsilon$-DP algorithm must exhibit $O(|V|2/ epsilon)$-additive error for an input dataset.
We propose a private $1+o(1)$ approximation algorithm which also recovers the blocks exactly.
arXiv Detail & Related papers (2023-01-31T19:14:30Z) - Structure of Classifier Boundaries: Case Study for a Naive Bayes
Classifier [1.1485218363676564]
We show that the boundary is both large and complicated in structure.
We create a new measure of uncertainty, called Neighbor Similarity, that compares the result for a point to the distribution of results for its neighbors.
arXiv Detail & Related papers (2022-12-08T16:23:42Z) - On the Power of Foundation Models [9.132197704350492]
We show that category theory provides powerful machinery to answer this question.
A foundation model with the minimum required power (up to symmetry) can theoretically solve downstream tasks for the category defined by pretext task.
Our final result can be seen as a new type of generalization theorem, showing that the foundation model can generate unseen objects from the target category.
arXiv Detail & Related papers (2022-11-29T16:10:11Z) - Language Model Decoding as Likelihood-Utility Alignment [54.70547032876017]
We introduce a taxonomy that groups decoding strategies based on their implicit assumptions about how well the model's likelihood is aligned with the task-specific notion of utility.
Specifically, by analyzing the correlation between the likelihood and the utility of predictions across a diverse set of tasks, we provide the first empirical evidence supporting the proposed taxonomy.
arXiv Detail & Related papers (2022-10-13T17:55:51Z) - The no-free-lunch theorems of supervised learning [0.0]
The no-free-lunch theorems promote a skeptical conclusion that all possible machine learning algorithms equally lack justification.
We argue that many standard learning algorithms should rather be understood as model-dependent.
arXiv Detail & Related papers (2022-02-09T15:24:30Z) - Rationales for Sequential Predictions [117.93025782838123]
Sequence models are a critical component of modern NLP systems, but their predictions are difficult to explain.
We consider model explanations though rationales, subsets of context that can explain individual model predictions.
We propose an efficient greedy algorithm to approximate this objective.
arXiv Detail & Related papers (2021-09-14T01:25:15Z) - A Framework and Benchmarking Study for Counterfactual Generating Methods
on Tabular Data [0.0]
Counterfactual explanations are viewed as an effective way to explain machine learning predictions.
There are already dozens of algorithms aiming to generate such explanations.
benchmarking study and framework can help practitioners in determining which technique and building blocks most suit their context.
arXiv Detail & Related papers (2021-07-09T21:06:03Z) - Adversarial Examples for $k$-Nearest Neighbor Classifiers Based on
Higher-Order Voronoi Diagrams [69.4411417775822]
Adversarial examples are a widely studied phenomenon in machine learning models.
We propose an algorithm for evaluating the adversarial robustness of $k$-nearest neighbor classification.
arXiv Detail & Related papers (2020-11-19T08:49:10Z) - Structured Graph Learning for Clustering and Semi-supervised
Classification [74.35376212789132]
We propose a graph learning framework to preserve both the local and global structure of data.
Our method uses the self-expressiveness of samples to capture the global structure and adaptive neighbor approach to respect the local structure.
Our model is equivalent to a combination of kernel k-means and k-means methods under certain condition.
arXiv Detail & Related papers (2020-08-31T08:41:20Z) - SOAR: Simultaneous Or of And Rules for Classification of Positive &
Negative Classes [0.0]
We present a novel and complete taxonomy of classifications that clearly capture and quantify the inherent ambiguity in noisy binary classifications in the real world.
We show that this approach leads to a more granular formulation of the likelihood model and a simulated-annealing based optimization achieves classification performance competitive with comparable techniques.
arXiv Detail & Related papers (2020-08-25T20:00:27Z) - Clustering with Tangles: Algorithmic Framework and Theoretical
Guarantees [10.992467680364962]
In this paper, we showcase the practical potential of tangles in machine learning applications.
Given a collection of cuts of any dataset, tangles aggregate these cuts to point in the direction of a dense structure.
We construct the algorithmic framework for clustering with tangles, prove theoretical guarantees in various settings, and provide extensive simulations and use cases.
arXiv Detail & Related papers (2020-06-25T14:23:56Z) - Pairwise Supervision Can Provably Elicit a Decision Boundary [84.58020117487898]
Similarity learning is a problem to elicit useful representations by predicting the relationship between a pair of patterns.
We show that similarity learning is capable of solving binary classification by directly eliciting a decision boundary.
arXiv Detail & Related papers (2020-06-11T05:35:16Z) - k-Nearest Neighbour Classifiers: 2nd Edition (with Python examples) [2.639737913330821]
Nearest Neighbour classification is achieved by identifying the nearest neighbours to a query example and using those neighbours to determine the class of the query.
This paper presents an overview of techniques for Nearest Neighbour classification focusing on; mechanisms for assessing similarity (distance), computational issues in identifying nearest neighbours and mechanisms for reducing the dimension of the data.
arXiv Detail & Related papers (2020-04-09T13:00:05Z) - Data Structures & Algorithms for Exact Inference in Hierarchical
Clustering [41.24805506595378]
We present novel dynamic-programming algorithms for emphexact inference in hierarchical clustering based on a novel trellis data structure.
Our algorithms scale in time and space proportional to the powerset of $N$ elements which is super-exponentially more efficient than explicitly considering each of the (2N-3)!! possible hierarchies.
arXiv Detail & Related papers (2020-02-26T17:43:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.