An Information-Theoretic Perspective on Overfitting and Underfitting
- URL: http://arxiv.org/abs/2010.06076v2
- Date: Fri, 6 Nov 2020 19:22:53 GMT
- Title: An Information-Theoretic Perspective on Overfitting and Underfitting
- Authors: Daniel Bashir, George D. Montanez, Sonia Sehra, Pedro Sandoval Segura,
Julius Lauw
- Abstract summary: We present an information-theoretic framework for understanding overfitting and underfitting in machine learning.
We prove the formal undecidability of determining whether an arbitrary classification algorithm will overfit a dataset.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present an information-theoretic framework for understanding overfitting
and underfitting in machine learning and prove the formal undecidability of
determining whether an arbitrary classification algorithm will overfit a
dataset. Measuring algorithm capacity via the information transferred from
datasets to models, we consider mismatches between algorithm capacities and
datasets to provide a signature for when a model can overfit or underfit a
dataset. We present results upper-bounding algorithm capacity, establish its
relationship to quantities in the algorithmic search framework for machine
learning, and relate our work to recent information-theoretic approaches to
generalization.
Related papers
- On Discriminative Probabilistic Modeling for Self-Supervised Representation Learning [85.75164588939185]
We study the discriminative probabilistic modeling problem on a continuous domain for (multimodal) self-supervised representation learning.
We conduct generalization error analysis to reveal the limitation of current InfoNCE-based contrastive loss for self-supervised representation learning.
arXiv Detail & Related papers (2024-10-11T18:02:46Z) - Structure Learning via Mutual Information [0.8702432681310399]
We propose a framework for learning and representing functional relationships in data using mutual information (MI) features.
Our method aims to capture the underlying structure of information in datasets, enabling more efficient and generalizable learning algorithms.
arXiv Detail & Related papers (2024-09-21T19:33:56Z) - Selecting Interpretability Techniques for Healthcare Machine Learning models [69.65384453064829]
In healthcare there is a pursuit for employing interpretable algorithms to assist healthcare professionals in several decision scenarios.
We overview a selection of eight algorithms, both post-hoc and model-based, that can be used for such purposes.
arXiv Detail & Related papers (2024-06-14T17:49:04Z) - Surprisal Driven $k$-NN for Robust and Interpretable Nonparametric
Learning [1.4293924404819704]
We shed new light on the traditional nearest neighbors algorithm from the perspective of information theory.
We propose a robust and interpretable framework for tasks such as classification, regression, density estimation, and anomaly detection using a single model.
Our work showcases the architecture's versatility by achieving state-of-the-art results in classification and anomaly detection.
arXiv Detail & Related papers (2023-11-17T00:35:38Z) - Topological Quality of Subsets via Persistence Matching Diagrams [0.196629787330046]
We measure the quality of a subset concerning the dataset it represents using topological data analysis techniques.
In particular, this approach enables us to explain why the chosen subset is likely to result in poor performance of a supervised learning model.
arXiv Detail & Related papers (2023-06-04T17:08:41Z) - Bures-Wasserstein Means of Graphs [60.42414991820453]
We propose a novel framework for defining a graph mean via embeddings in the space of smooth graph signal distributions.
By finding a mean in this embedding space, we can recover a mean graph that preserves structural information.
We establish the existence and uniqueness of the novel graph mean, and provide an iterative algorithm for computing it.
arXiv Detail & Related papers (2023-05-31T11:04:53Z) - Detection and Evaluation of Clusters within Sequential Data [58.720142291102135]
Clustering algorithms for Block Markov Chains possess theoretical optimality guarantees.
In particular, our sequential data is derived from human DNA, written text, animal movement data and financial markets.
It is found that the Block Markov Chain model assumption can indeed produce meaningful insights in exploratory data analyses.
arXiv Detail & Related papers (2022-10-04T15:22:39Z) - RandomSCM: interpretable ensembles of sparse classifiers tailored for
omics data [59.4141628321618]
We propose an ensemble learning algorithm based on conjunctions or disjunctions of decision rules.
The interpretability of the models makes them useful for biomarker discovery and patterns discovery in high dimensional data.
arXiv Detail & Related papers (2022-08-11T13:55:04Z) - Undecidability of Underfitting in Learning Algorithms [0.0]
We prove that deciding whether an encodable learning algorithm will always underfit a dataset, even if given unlimited training time, is undecidable.
We discuss the importance of this result and potential topics for further research, including information-theoretic and probabilistic strategies for bounding learning algorithm fit.
arXiv Detail & Related papers (2021-02-04T19:35:05Z) - Model-agnostic interpretation by visualization of feature perturbations [0.0]
We propose a model-agnostic interpretation approach that uses visualization of feature perturbations induced by the particle swarm optimization algorithm.
We validate our approach both qualitatively and quantitatively on publicly available datasets.
arXiv Detail & Related papers (2021-01-26T00:53:29Z) - Information Theoretic Meta Learning with Gaussian Processes [74.54485310507336]
We formulate meta learning using information theoretic concepts; namely, mutual information and the information bottleneck.
By making use of variational approximations to the mutual information, we derive a general and tractable framework for meta learning.
arXiv Detail & Related papers (2020-09-07T16:47:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.