DessiLBI: Exploring Structural Sparsity of Deep Networks via
Differential Inclusion Paths
- URL: http://arxiv.org/abs/2007.02010v1
- Date: Sat, 4 Jul 2020 04:40:16 GMT
- Title: DessiLBI: Exploring Structural Sparsity of Deep Networks via
Differential Inclusion Paths
- Authors: Yanwei Fu, Chen Liu, Donghao Li, Xinwei Sun, Jinshan Zeng, Yuan Yao
- Abstract summary: We propose a new approach based on differential inclusions of inverse scale spaces.
We show that DessiLBI unveils "winning tickets" in early epochs.
- Score: 45.947140164621096
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Over-parameterization is ubiquitous nowadays in training neural networks to
benefit both optimization in seeking global optima and generalization in
reducing prediction error. However, compressive networks are desired in many
real world applications and direct training of small networks may be trapped in
local optima. In this paper, instead of pruning or distilling
over-parameterized models to compressive ones, we propose a new approach based
on differential inclusions of inverse scale spaces. Specifically, it generates
a family of models from simple to complex ones that couples a pair of
parameters to simultaneously train over-parameterized deep models and
structural sparsity on weights of fully connected and convolutional layers.
Such a differential inclusion scheme has a simple discretization, proposed as
Deep structurally splitting Linearized Bregman Iteration (DessiLBI), whose
global convergence analysis in deep learning is established that from any
initializations, algorithmic iterations converge to a critical point of
empirical risks. Experimental evidence shows that DessiLBI achieve comparable
and even better performance than the competitive optimizers in exploring the
structural sparsity of several widely used backbones on the benchmark datasets.
Remarkably, with early stopping, DessiLBI unveils "winning tickets" in early
epochs: the effective sparse structure with comparable test accuracy to fully
trained over-parameterized models.
Related papers
- Component-based Sketching for Deep ReLU Nets [55.404661149594375]
We develop a sketching scheme based on deep net components for various tasks.
We transform deep net training into a linear empirical risk minimization problem.
We show that the proposed component-based sketching provides almost optimal rates in approximating saturated functions.
arXiv Detail & Related papers (2024-09-21T15:30:43Z) - Adaptive Anomaly Detection in Network Flows with Low-Rank Tensor Decompositions and Deep Unrolling [9.20186865054847]
Anomaly detection (AD) is increasingly recognized as a key component for ensuring the resilience of future communication systems.
This work considers AD in network flows using incomplete measurements.
We propose a novel block-successive convex approximation algorithm based on a regularized model-fitting objective.
Inspired by Bayesian approaches, we extend the model architecture to perform online adaptation to per-flow and per-time-step statistics.
arXiv Detail & Related papers (2024-09-17T19:59:57Z) - Federated Learning with Projected Trajectory Regularization [65.6266768678291]
Federated learning enables joint training of machine learning models from distributed clients without sharing their local data.
One key challenge in federated learning is to handle non-identically distributed data across the clients.
We propose a novel federated learning framework with projected trajectory regularization (FedPTR) for tackling the data issue.
arXiv Detail & Related papers (2023-12-22T02:12:08Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - Bayesian Federated Neural Matching that Completes Full Information [2.6566593102111473]
Federated learning is a machine learning paradigm where locally trained models are distilled into a global model.
We propose a novel approach that overcomes this flaw by introducing a Kullback-Leibler divergence penalty at each iteration.
arXiv Detail & Related papers (2022-11-15T09:47:56Z) - Orthogonal Stochastic Configuration Networks with Adaptive Construction
Parameter for Data Analytics [6.940097162264939]
randomness makes SCNs more likely to generate approximate linear correlative nodes that are redundant and low quality.
In light of a fundamental principle in machine learning, that is, a model with fewer parameters holds improved generalization.
This paper proposes orthogonal SCN, termed OSCN, to filtrate out the low-quality hidden nodes for network structure reduction.
arXiv Detail & Related papers (2022-05-26T07:07:26Z) - Probabilistic partition of unity networks: clustering based deep
approximation [0.0]
Partition of unity networks (POU-Nets) have been shown capable of realizing algebraic convergence rates for regression and solution of PDEs.
We enrich POU-Nets with a Gaussian noise model to obtain a probabilistic generalization amenable to gradient-based generalizations of a maximum likelihood loss.
We provide benchmarks quantifying performance in high/low-dimensions, demonstrating that convergence rates depend only on the latent dimension of data within high-dimensional space.
arXiv Detail & Related papers (2021-07-07T08:02:00Z) - Reframing Neural Networks: Deep Structure in Overcomplete
Representations [41.84502123663809]
We introduce deep frame approximation, a unifying framework for representation learning with structured overcomplete frames.
We quantify structural differences with the deep frame potential, a data-independent measure of coherence linked to representation uniqueness and stability.
This connection to the established theory of overcomplete representations suggests promising new directions for principled deep network architecture design.
arXiv Detail & Related papers (2021-03-10T01:15:14Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z) - Belief Propagation Reloaded: Learning BP-Layers for Labeling Problems [83.98774574197613]
We take one of the simplest inference methods, a truncated max-product Belief propagation, and add what is necessary to make it a proper component of a deep learning model.
This BP-Layer can be used as the final or an intermediate block in convolutional neural networks (CNNs)
The model is applicable to a range of dense prediction problems, is well-trainable and provides parameter-efficient and robust solutions in stereo, optical flow and semantic segmentation.
arXiv Detail & Related papers (2020-03-13T13:11:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.