Drill the Cork of Information Bottleneck by Inputting the Most Important
Data
- URL: http://arxiv.org/abs/2105.07181v1
- Date: Sat, 15 May 2021 09:20:36 GMT
- Title: Drill the Cork of Information Bottleneck by Inputting the Most Important
Data
- Authors: Xinyu Peng, Jiawei Zhang, Fei-Yue Wang and Li Li
- Abstract summary: How to efficiently train deep neural networks remains to be solved.
The information bottleneck (IB) theory claims that the optimization process consists of an initial fitting phase and the following compression phase.
We show that the fitting phase depicted in the IB theory will be boosted with a high signal-to-noise ratio if the typicality sampling is appropriately adopted.
- Score: 28.32769151293851
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep learning has become the most powerful machine learning tool in the last
decade. However, how to efficiently train deep neural networks remains to be
thoroughly solved. The widely used minibatch stochastic gradient descent (SGD)
still needs to be accelerated. As a promising tool to better understand the
learning dynamic of minibatch SGD, the information bottleneck (IB) theory
claims that the optimization process consists of an initial fitting phase and
the following compression phase. Based on this principle, we further study
typicality sampling, an efficient data selection method, and propose a new
explanation of how it helps accelerate the training process of the deep
networks. We show that the fitting phase depicted in the IB theory will be
boosted with a high signal-to-noise ratio of gradient approximation if the
typicality sampling is appropriately adopted. Furthermore, this finding also
implies that the prior information of the training set is critical to the
optimization process and the better use of the most important data can help the
information flow through the bottleneck faster. Both theoretical analysis and
experimental results on synthetic and real-world datasets demonstrate our
conclusions.
Related papers
- A Bayesian Approach to Data Point Selection [24.98069363998565]
Data point selection (DPS) is becoming a critical topic in deep learning.
Existing approaches to DPS are predominantly based on a bi-level optimisation (BLO) formulation.
We propose a novel Bayesian approach to DPS.
arXiv Detail & Related papers (2024-11-06T09:04:13Z) - IB-AdCSCNet:Adaptive Convolutional Sparse Coding Network Driven by Information Bottleneck [4.523653503622693]
We introduce IB-AdCSCNet, a deep learning model grounded in information bottleneck theory.
IB-AdCSCNet seamlessly integrates the information bottleneck trade-off strategy into deep networks.
Experimental results on CIFAR-10 and CIFAR-100 datasets demonstrate that IB-AdCSCNet not only matches the performance of deep residual convolutional networks but also outperforms them when handling corrupted data.
arXiv Detail & Related papers (2024-05-23T05:35:57Z) - Robust Neural Pruning with Gradient Sampling Optimization for Residual Neural Networks [0.0]
This research embarks on pioneering the integration of gradient sampling optimization techniques, particularly StochGradAdam, into the pruning process of neural networks.
Our main objective is to address the significant challenge of maintaining accuracy in pruned neural models, critical in resource-constrained scenarios.
arXiv Detail & Related papers (2023-12-26T12:19:22Z) - Diffusion Generative Flow Samplers: Improving learning signals through
partial trajectory optimization [87.21285093582446]
Diffusion Generative Flow Samplers (DGFS) is a sampling-based framework where the learning process can be tractably broken down into short partial trajectory segments.
Our method takes inspiration from the theory developed for generative flow networks (GFlowNets)
arXiv Detail & Related papers (2023-10-04T09:39:05Z) - Learning Large-scale Neural Fields via Context Pruned Meta-Learning [60.93679437452872]
We introduce an efficient optimization-based meta-learning technique for large-scale neural field training.
We show how gradient re-scaling at meta-test time allows the learning of extremely high-quality neural fields.
Our framework is model-agnostic, intuitive, straightforward to implement, and shows significant reconstruction improvements for a wide range of signals.
arXiv Detail & Related papers (2023-02-01T17:32:16Z) - Segmentation-guided Domain Adaptation for Efficient Depth Completion [3.441021278275805]
We propose an efficient depth completion model based on a vgg05-like CNN architecture and a semi-supervised domain adaptation approach.
In order to boost spatial coherence, we guide the learning process using segmentations as additional source of information.
Our approach improves on previous efficient and low parameter state of the art approaches while having a noticeably lower computational footprint.
arXiv Detail & Related papers (2022-10-14T13:01:25Z) - Where Is My Training Bottleneck? Hidden Trade-Offs in Deep Learning
Preprocessing Pipelines [77.45213180689952]
Preprocessing pipelines in deep learning aim to provide sufficient data throughput to keep the training processes busy.
We introduce a new perspective on efficiently preparing datasets for end-to-end deep learning pipelines.
We obtain an increased throughput of 3x to 13x compared to an untuned system.
arXiv Detail & Related papers (2022-02-17T14:31:58Z) - Understanding the Effects of Data Parallelism and Sparsity on Neural
Network Training [126.49572353148262]
We study two factors in neural network training: data parallelism and sparsity.
Despite their promising benefits, understanding of their effects on neural network training remains elusive.
arXiv Detail & Related papers (2020-03-25T10:49:22Z) - Large-Scale Gradient-Free Deep Learning with Recursive Local
Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources.
Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize.
We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z) - Large Batch Training Does Not Need Warmup [111.07680619360528]
Training deep neural networks using a large batch size has shown promising results and benefits many real-world applications.
In this paper, we propose a novel Complete Layer-wise Adaptive Rate Scaling (CLARS) algorithm for large-batch training.
Based on our analysis, we bridge the gap and illustrate the theoretical insights for three popular large-batch training techniques.
arXiv Detail & Related papers (2020-02-04T23:03:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.