Related papers: Generalization Bounds via Meta-Learned Model Representations: PAC-Bayes and Sample Compression Hypernetworks

Generalization Bounds via Meta-Learned Model Representations: PAC-Bayes and Sample Compression Hypernetworks

URL: http://arxiv.org/abs/2410.13577v2
Date: Fri, 21 Feb 2025 14:23:17 GMT
Title: Generalization Bounds via Meta-Learned Model Representations: PAC-Bayes and Sample Compression Hypernetworks
Authors: Benjamin Leblanc, Mathieu Bazinet, Nathaniel D'Amours, Alexandre Drouin, Pascal Germain,
Abstract summary: We use PAC-Bayesian and Sample Compress learning frameworks in a meta-learning scheme.<n>The originality of our approach lies in the investigated hypernetwork architectures that encode the dataset before decoding the parameters.<n>The latter theorem exploits the pivotal information transiting at the encoder-decoder junction to compute generalization guarantees for each downstream predictor.
Score: 47.83977297248753
License: http://creativecommons.org/licenses/by/4.0/
Abstract: PAC-Bayesian and Sample Compress learning frameworks are instrumental for deriving tight (non-vacuous) generalization bounds for neural networks. We leverage these results in a meta-learning scheme, relying on a hypernetwork that outputs the parameters of a downstream predictor from a dataset input. The originality of our approach lies in the investigated hypernetwork architectures that encode the dataset before decoding the parameters: (1) a PAC-Bayesian encoder that expresses a posterior distribution over a latent space, (2) a Sample Compress encoder that selects a small sample of the dataset input along with a message from a discrete set, and (3) a hybrid between both approaches motivated by a new Sample Compress theorem handling continuous messages. The latter theorem exploits the pivotal information transiting at the encoder-decoder junction to compute generalization guarantees for each downstream predictor obtained by our meta-learning scheme.

Related papers

Autoencoded UMAP-Enhanced Clustering for Unsupervised Learning [49.1574468325115]
We propose a novel approach to unsupervised learning by constructing a non-linear embedding of the data into a low-dimensional space followed by any conventional clustering algorithm. The embedding promotes clusterability of the data and is comprised of two mappings: the encoder of an autoencoder neural network and the output of UMAP algorithm. When applied to MNIST data, AUEC significantly outperforms the state-of-the-art techniques in terms of clustering accuracy.
arXiv Detail & Related papers (2025-01-13T22:30:38Z)
Sample Compression Unleashed: New Generalization Bounds for Real Valued Losses [9.180445799821717]
We present a general framework for deriving new sample compression bounds that hold for real-valued unbounded losses. We empirically demonstrate the tightness of the bounds and their versatility by evaluating them on random forests and multiple types of neural networks.
arXiv Detail & Related papers (2024-09-26T15:08:52Z)
Convolutional Neural Network Compression Based on Low-Rank Decomposition [3.3295360710329738]
This paper proposes a model compression method that integrates Variational Bayesian Matrix Factorization. VBMF is employed to estimate the rank of the weight tensor at each layer. Experimental results show that for both high and low compression ratios, our compression model exhibits advanced performance.
arXiv Detail & Related papers (2024-08-29T06:40:34Z)
Refine, Discriminate and Align: Stealing Encoders via Sample-Wise Prototypes and Multi-Relational Extraction [57.16121098944589]
RDA is a pioneering approach designed to address two primary deficiencies prevalent in previous endeavors aiming at stealing pre-trained encoders. It is accomplished via a sample-wise prototype, which consolidates the target encoder's representations for a given sample's various perspectives. For more potent efficacy, we develop a multi-relational extraction loss that trains the surrogate encoder to Discriminate mismatched embedding-prototype pairs.
arXiv Detail & Related papers (2023-12-01T15:03:29Z)
Joint Bayesian Inference of Graphical Structure and Parameters with a Single Generative Flow Network [59.79008107609297]
We propose in this paper to approximate the joint posterior over the structure of a Bayesian Network. We use a single GFlowNet whose sampling policy follows a two-phase process. Since the parameters are included in the posterior distribution, this leaves more flexibility for the local probability models.
arXiv Detail & Related papers (2023-05-30T19:16:44Z)
Real-World Compositional Generalization with Disentangled Sequence-to-Sequence Learning [81.24269148865555]
A recently proposed Disentangled sequence-to-sequence model (Dangle) shows promising generalization capability. We introduce two key modifications to this model which encourage more disentangled representations and improve its compute and memory efficiency. Specifically, instead of adaptively re-encoding source keys and values at each time step, we disentangle their representations and only re-encode keys periodically.
arXiv Detail & Related papers (2022-12-12T15:40:30Z)
Exploring and Exploiting Multi-Granularity Representations for Machine Reading Comprehension [13.191437539419681]
We propose a novel approach called Adaptive Bidirectional Attention-Capsule Network (ABA-Net) ABA-Net adaptively exploits the source representations of different levels to the predictor. We set the new state-of-the-art performance on the SQuAD 1.0 dataset.
arXiv Detail & Related papers (2022-08-18T10:14:32Z)
Disentangled Sequence to Sequence Learning for Compositional Generalization [62.954842223732435]
We propose an extension to sequence-to-sequence models which allows us to learn disentangled representations by adaptively re-encoding the source input. Experimental results on semantic parsing and machine translation empirically show that our proposal yields more disentangled representations and better generalization.
arXiv Detail & Related papers (2021-10-09T22:27:19Z)
BOSS: Bidirectional One-Shot Synthesis of Adversarial Examples [8.359029046999233]
A one-shot synthesis of adversarial examples is proposed in this paper. The inputs are synthesized from scratch to induce arbitrary soft predictions at the output of pre-trained models. We demonstrate the generality and versatility of the framework and approach proposed through applications to the design of targeted adversarial attacks.
arXiv Detail & Related papers (2021-08-05T17:43:36Z)
Regularization-Agnostic Compressed Sensing MRI Reconstruction with Hypernetworks [21.349071909858218]
We present a novel strategy of using a hypernetwork to generate the parameters of a separate reconstruction network as a function of the regularization weight(s) At test time, for a given under-sampled image, our model can rapidly compute reconstructions with different amounts of regularization. We analyze the variability of these reconstructions, especially in situations when the overall quality is similar.
arXiv Detail & Related papers (2021-01-06T18:55:37Z)
Unfolding Neural Networks for Compressive Multichannel Blind Deconvolution [71.29848468762789]
We propose a learned-structured unfolding neural network for the problem of compressive sparse multichannel blind-deconvolution. In this problem, each channel's measurements are given as convolution of a common source signal and sparse filter. We demonstrate that our method is superior to classical structured compressive sparse multichannel blind-deconvolution methods in terms of accuracy and speed of sparse filter recovery.
arXiv Detail & Related papers (2020-10-22T02:34:33Z)
Encoded Prior Sliced Wasserstein AutoEncoder for learning latent manifold representations [0.7614628596146599]
We introduce an Encoded Prior Sliced Wasserstein AutoEncoder. An additional prior-encoder network learns an embedding of the data manifold. We show that the prior encodes the geometry underlying the data unlike conventional autoencoders.
arXiv Detail & Related papers (2020-10-02T14:58:54Z)
DessiLBI: Exploring Structural Sparsity of Deep Networks via Differential Inclusion Paths [45.947140164621096]
We propose a new approach based on differential inclusions of inverse scale spaces. We show that DessiLBI unveils "winning tickets" in early epochs.
arXiv Detail & Related papers (2020-07-04T04:40:16Z)
Robust Sampling in Deep Learning [62.997667081978825]
Deep learning requires regularization mechanisms to reduce overfitting and improve generalization. We address this problem by a new regularization method based on distributional robust optimization. During the training, the selection of samples is done according to their accuracy in such a way that the worst performed samples are the ones that contribute the most in the optimization.
arXiv Detail & Related papers (2020-06-04T09:46:52Z)
MetricUNet: Synergistic Image- and Voxel-Level Learning for Precise CT Prostate Segmentation via Online Sampling [66.01558025094333]
We propose a two-stage framework, with the first stage to quickly localize the prostate region and the second stage to precisely segment the prostate. We introduce a novel online metric learning module through voxel-wise sampling in the multi-task network. Our method can effectively learn more representative voxel-level features compared with the conventional learning methods with cross-entropy or Dice loss.
arXiv Detail & Related papers (2020-05-15T10:37:02Z)
Understanding Generalization in Deep Learning via Tensor Methods [53.808840694241]
We advance the understanding of the relations between the network's architecture and its generalizability from the compression perspective. We propose a series of intuitive, data-dependent and easily-measurable properties that tightly characterize the compressibility and generalizability of neural networks.
arXiv Detail & Related papers (2020-01-14T22:26:57Z)
Synthetic Datasets for Neural Program Synthesis [66.20924952964117]
We propose a new methodology for controlling and evaluating the bias of synthetic data distributions over both programs and specifications. We demonstrate, using the Karel DSL and a small Calculator DSL, that training deep networks on these distributions leads to improved cross-distribution generalization performance.
arXiv Detail & Related papers (2019-12-27T21:28:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.