Related papers: Generalization Bounds: Perspectives from Information Theory and PAC-Bayes

Generalization Bounds: Perspectives from Information Theory and PAC-Bayes

URL: http://arxiv.org/abs/2309.04381v2
Date: Wed, 27 Mar 2024 17:07:47 GMT
Title: Generalization Bounds: Perspectives from Information Theory and PAC-Bayes
Authors: Fredrik Hellström, Giuseppe Durisi, Benjamin Guedj, Maxim Raginsky,
Abstract summary: The PAC-Bayesian approach has been established as a flexible framework to address the generalization capabilities of machine learning algorithms. An information-theoretic view of generalization has developed, wherein the relation between generalization and various information measures has been established. We present techniques and results that the two perspectives have in common, and discuss the approaches and interpretations that differ.
Score: 31.803107987439784
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: A fundamental question in theoretical machine learning is generalization. Over the past decades, the PAC-Bayesian approach has been established as a flexible framework to address the generalization capabilities of machine learning algorithms, and design new ones. Recently, it has garnered increased interest due to its potential applicability for a variety of learning algorithms, including deep neural networks. In parallel, an information-theoretic view of generalization has developed, wherein the relation between generalization and various information measures has been established. This framework is intimately connected to the PAC-Bayesian approach, and a number of results have been independently discovered in both strands. In this monograph, we highlight this strong connection and present a unified treatment of PAC-Bayesian and information-theoretic generalization bounds. We present techniques and results that the two perspectives have in common, and discuss the approaches and interpretations that differ. In particular, we demonstrate how many proofs in the area share a modular structure, through which the underlying ideas can be intuited. We pay special attention to the conditional mutual information (CMI) framework; analytical studies of the information complexity of learning algorithms; and the application of the proposed methods to deep learning. This monograph is intended to provide a comprehensive introduction to information-theoretic generalization bounds and their connection to PAC-Bayes, serving as a foundation from which the most recent developments are accessible. It is aimed broadly towards researchers with an interest in generalization and theoretical machine learning.

Related papers

Towards the Generalization of Multi-view Learning: An Information-theoretical Analysis [28.009990407017618]
We develop information-theoretic generalization bounds for multi-view learning. We derive novel data-dependent bounds under both leave-one-out and supersample settings. In the interpolating regime, we further establish the fast-rate bound for multi-view learning.
arXiv Detail & Related papers (2025-01-28T07:47:19Z)
Discovering emergent connections in quantum physics research via dynamic word embeddings [0.562479170374811]
We introduce a novel approach based on dynamic word embeddings for concept combination prediction. Unlike knowledge graphs, our method captures implicit relationships between concepts, can be learned in a fully unsupervised manner, and encodes a broader spectrum of information. Our findings suggest that this representation offers a more flexible and informative way of modeling conceptual relationships in scientific literature.
arXiv Detail & Related papers (2024-11-10T19:45:59Z)
Multi-View Majority Vote Learning Algorithms: Direct Minimization of PAC-Bayesian Bounds [0.8039067099377079]
We extend PAC-Bayesian theory to multi-view learning, introducing novel generalization bounds based on R'enyi divergence. These bounds provide an alternative to traditional Kullback-Leibler divergence-based counterparts, leveraging the flexibility of R'enyi divergence. We also propose first- and second-order oracle PAC-Bayesian bounds and extend the C-bound to multi-view settings.
arXiv Detail & Related papers (2024-11-09T20:25:47Z)
Coding for Intelligence from the Perspective of Category [66.14012258680992]
Coding targets compressing and reconstructing data, and intelligence. Recent trends demonstrate the potential homogeneity of these two fields. We propose a novel problem of Coding for Intelligence from the category theory view.
arXiv Detail & Related papers (2024-07-01T07:05:44Z)
Discovering Common Information in Multi-view Data [35.37807004353416]
We introduce an innovative and mathematically rigorous definition for computing common information from multi-view data. We develop a novel supervised multi-view learning framework to capture both common and unique information.
arXiv Detail & Related papers (2024-06-21T10:47:06Z)
A Unified and General Framework for Continual Learning [58.72671755989431]
Continual Learning (CL) focuses on learning from dynamic and changing data distributions while retaining previously acquired knowledge. Various methods have been developed to address the challenge of catastrophic forgetting, including regularization-based, Bayesian-based, and memory-replay-based techniques. This research aims to bridge this gap by introducing a comprehensive and overarching framework that encompasses and reconciles these existing methodologies.
arXiv Detail & Related papers (2024-03-20T02:21:44Z)
Federated Learning for Generalization, Robustness, Fairness: A Survey and Benchmark [55.898771405172155]
Federated learning has emerged as a promising paradigm for privacy-preserving collaboration among different parties. We provide a systematic overview of the important and recent developments of research on federated learning.
arXiv Detail & Related papers (2023-11-12T06:32:30Z)
Bayesian Learning for Neural Networks: an algorithmic survey [95.42181254494287]
This self-contained survey engages and introduces readers to the principles and algorithms of Bayesian Learning for Neural Networks. It provides an introduction to the topic from an accessible, practical-algorithmic perspective.
arXiv Detail & Related papers (2022-11-21T21:36:58Z)
Knowledge Graph Augmented Network Towards Multiview Representation Learning for Aspect-based Sentiment Analysis [96.53859361560505]
We propose a knowledge graph augmented network (KGAN) to incorporate external knowledge with explicitly syntactic and contextual information. KGAN captures the sentiment feature representations from multiple perspectives, i.e., context-, syntax- and knowledge-based. Experiments on three popular ABSA benchmarks demonstrate the effectiveness and robustness of our KGAN.
arXiv Detail & Related papers (2022-01-13T08:25:53Z)
A survey of Bayesian Network structure learning [8.411014222942168]
This paper provides a review of 61 algorithms proposed for learning BN structure from data. The basic approach of each algorithm is described in consistent terms, and the similarities and differences between them highlighted. Approaches for dealing with data noise in real-world datasets and incorporating expert knowledge into the learning process are also covered.
arXiv Detail & Related papers (2021-09-23T14:54:00Z)
Investigating Bi-Level Optimization for Learning and Vision from a Unified Perspective: A Survey and Beyond [114.39616146985001]
In machine learning and computer vision fields, despite the different motivations and mechanisms, a lot of complex problems contain a series of closely related subproblms. In this paper, we first uniformly express these complex learning and vision problems from the perspective of Bi-Level Optimization (BLO) Then we construct a value-function-based single-level reformulation and establish a unified algorithmic framework to understand and formulate mainstream gradient-based BLO methodologies.
arXiv Detail & Related papers (2021-01-27T16:20:23Z)
Reasoning About Generalization via Conditional Mutual Information [26.011933885798506]
We use Mutual Information (CMI) to quantify how well the input can be recognized. We show that bounds on CMI can be obtained from VC dimension, compression schemes, differential privacy, and other methods. We then show that bounded CMI implies various forms of generalization.
arXiv Detail & Related papers (2020-01-24T18:13:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.