Improving Robustness via Tilted Exponential Layer: A
Communication-Theoretic Perspective
- URL: http://arxiv.org/abs/2311.01047v3
- Date: Sun, 3 Mar 2024 21:06:21 GMT
- Title: Improving Robustness via Tilted Exponential Layer: A
Communication-Theoretic Perspective
- Authors: Bhagyashree Puranik, Ahmad Beirami, Yao Qin, Upamanyu Madhow
- Abstract summary: Communication theory aims at enhancing the signal-to-noise ratio at the output of a neural network layer via neural competition.
TEXP learning can be interpreted as maximum likelihood estimation of matched filters.
TEXP inference enhances robustness against noise and other common corruptions.
- Score: 22.062492862286025
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: State-of-the-art techniques for enhancing robustness of deep networks mostly
rely on empirical risk minimization with suitable data augmentation. In this
paper, we propose a complementary approach motivated by communication theory,
aimed at enhancing the signal-to-noise ratio at the output of a neural network
layer via neural competition during learning and inference. In addition to
standard empirical risk minimization, neurons compete to sparsely represent
layer inputs by maximization of a tilted exponential (TEXP) objective function
for the layer. TEXP learning can be interpreted as maximum likelihood
estimation of matched filters under a Gaussian model for data noise. Inference
in a TEXP layer is accomplished by replacing batch norm by a tilted softmax,
which can be interpreted as computation of posterior probabilities for the
competing signaling hypotheses represented by each neuron. After providing
insights via simplified models, we show, by experimentation on standard image
datasets, that TEXP learning and inference enhances robustness against noise
and other common corruptions, without requiring data augmentation. Further
cumulative gains in robustness against this array of distortions can be
obtained by appropriately combining TEXP with data augmentation techniques. The
code for all our experiments is available at
https://github.com/bhagyapuranik/texp_for_robustness.
Related papers
- FFEINR: Flow Feature-Enhanced Implicit Neural Representation for
Spatio-temporal Super-Resolution [4.577685231084759]
This paper proposes a Feature-Enhanced Neural Implicit Representation (FFEINR) for super-resolution of flow field data.
It can take full advantage of the implicit neural representation in terms of model structure and sampling resolution.
The training process of FFEINR is facilitated by introducing feature enhancements for the input layer.
arXiv Detail & Related papers (2023-08-24T02:28:18Z) - Benign Overfitting for Two-layer ReLU Convolutional Neural Networks [60.19739010031304]
We establish algorithm-dependent risk bounds for learning two-layer ReLU convolutional neural networks with label-flipping noise.
We show that, under mild conditions, the neural network trained by gradient descent can achieve near-zero training loss and Bayes optimal test risk.
arXiv Detail & Related papers (2023-03-07T18:59:38Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - On the Robustness and Generalization of Deep Learning Driven Full
Waveform Inversion [2.5382095320488665]
Full Waveform Inversion (FWI) is commonly epitomized as an image-to-image translation task.
Despite being trained with synthetic data, the deep learning-driven FWI is expected to perform well when evaluated with sufficient real-world data.
We study such properties by asking: how robust are these deep neural networks and how do they generalize?
arXiv Detail & Related papers (2021-11-28T19:27:59Z) - Neural Tangent Kernel Empowered Federated Learning [35.423391869982694]
Federated learning (FL) is a privacy-preserving paradigm where multiple participants jointly solve a machine learning problem without sharing raw data.
We propose a novel FL paradigm empowered by the neural tangent kernel (NTK) framework.
We show that the proposed paradigm can achieve the same accuracy while reducing the number of communication rounds by an order of magnitude.
arXiv Detail & Related papers (2021-10-07T17:58:58Z) - Probabilistic partition of unity networks: clustering based deep
approximation [0.0]
Partition of unity networks (POU-Nets) have been shown capable of realizing algebraic convergence rates for regression and solution of PDEs.
We enrich POU-Nets with a Gaussian noise model to obtain a probabilistic generalization amenable to gradient-based generalizations of a maximum likelihood loss.
We provide benchmarks quantifying performance in high/low-dimensions, demonstrating that convergence rates depend only on the latent dimension of data within high-dimensional space.
arXiv Detail & Related papers (2021-07-07T08:02:00Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - Understanding Self-supervised Learning with Dual Deep Networks [74.92916579635336]
We propose a novel framework to understand contrastive self-supervised learning (SSL) methods that employ dual pairs of deep ReLU networks.
We prove that in each SGD update of SimCLR with various loss functions, the weights at each layer are updated by a emphcovariance operator.
To further study what role the covariance operator plays and which features are learned in such a process, we model data generation and augmentation processes through a emphhierarchical latent tree model (HLTM)
arXiv Detail & Related papers (2020-10-01T17:51:49Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z) - Belief Propagation Reloaded: Learning BP-Layers for Labeling Problems [83.98774574197613]
We take one of the simplest inference methods, a truncated max-product Belief propagation, and add what is necessary to make it a proper component of a deep learning model.
This BP-Layer can be used as the final or an intermediate block in convolutional neural networks (CNNs)
The model is applicable to a range of dense prediction problems, is well-trainable and provides parameter-efficient and robust solutions in stereo, optical flow and semantic segmentation.
arXiv Detail & Related papers (2020-03-13T13:11:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.