Neural Network Training Using $\ell_1$-Regularization and Bi-fidelity
Data
- URL: http://arxiv.org/abs/2105.13011v1
- Date: Thu, 27 May 2021 08:56:17 GMT
- Title: Neural Network Training Using $\ell_1$-Regularization and Bi-fidelity
Data
- Authors: Subhayan De and Alireza Doostan
- Abstract summary: We study the effects of sparsity promoting $ell_$-regularization on training neural networks when only a small training dataset from a high-fidelity model is available.
We consider two variants of $ell_$-regularization informed by the parameters of an identical network trained using data from lower-fidelity models of the problem at hand.
These bifidelity strategies are generalizations of transfer learning of neural networks that uses the parameters learned from a large low-fidelity dataset to efficiently train networks for a small high-fidelity dataset.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the capability of accurately representing a functional relationship
between the inputs of a physical system's model and output quantities of
interest, neural networks have become popular for surrogate modeling in
scientific applications. However, as these networks are over-parameterized,
their training often requires a large amount of data. To prevent overfitting
and improve generalization error, regularization based on, e.g., $\ell_1$- and
$\ell_2$-norms of the parameters is applied. Similarly, multiple connections of
the network may be pruned to increase sparsity in the network parameters. In
this paper, we explore the effects of sparsity promoting
$\ell_1$-regularization on training neural networks when only a small training
dataset from a high-fidelity model is available. As opposed to standard
$\ell_1$-regularization that is known to be inadequate, we consider two
variants of $\ell_1$-regularization informed by the parameters of an identical
network trained using data from lower-fidelity models of the problem at hand.
These bi-fidelity strategies are generalizations of transfer learning of neural
networks that uses the parameters learned from a large low-fidelity dataset to
efficiently train networks for a small high-fidelity dataset. We also compare
the bi-fidelity strategies with two $\ell_1$-regularization methods that only
use the high-fidelity dataset. Three numerical examples for propagating
uncertainty through physical systems are used to show that the proposed
bi-fidelity $\ell_1$-regularization strategies produce errors that are one
order of magnitude smaller than those of networks trained only using datasets
from the high-fidelity models.
Related papers
- Just How Flexible are Neural Networks in Practice? [89.80474583606242]
It is widely believed that a neural network can fit a training set containing at least as many samples as it has parameters.
In practice, however, we only find solutions via our training procedure, including the gradient and regularizers, limiting flexibility.
arXiv Detail & Related papers (2024-06-17T12:24:45Z) - Dr$^2$Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning [81.0108753452546]
We propose Dynamic Reversible Dual-Residual Networks, or Dr$2$Net, to finetune a pretrained model with substantially reduced memory consumption.
Dr$2$Net contains two types of residual connections, one maintaining the residual structure in the pretrained models, and the other making the network reversible.
We show that Dr$2$Net can reach comparable performance to conventional finetuning but with significantly less memory usage.
arXiv Detail & Related papers (2024-01-08T18:59:31Z) - Residual Multi-Fidelity Neural Network Computing [0.0]
We present a residual multi-fidelity computational framework that formulates the correlation between models as a residual function.
We show that dramatic savings in computational cost may be achieved when the output predictions are desired to be accurate within small tolerances.
arXiv Detail & Related papers (2023-10-05T14:43:16Z) - Layer-wise Linear Mode Connectivity [52.6945036534469]
Averaging neural network parameters is an intuitive method for the knowledge of two independent models.
It is most prominently used in federated learning.
We analyse the performance of the models that result from averaging single, or groups.
arXiv Detail & Related papers (2023-07-13T09:39:10Z) - ReLU Neural Networks with Linear Layers are Biased Towards Single- and Multi-Index Models [9.96121040675476]
This manuscript explores how properties of functions learned by neural networks of depth greater than two layers affect predictions.
Our framework considers a family of networks of varying depths that all have the same capacity but different representation costs.
arXiv Detail & Related papers (2023-05-24T22:10:12Z) - Neural networks trained with SGD learn distributions of increasing
complexity [78.30235086565388]
We show that neural networks trained using gradient descent initially classify their inputs using lower-order input statistics.
We then exploit higher-order statistics only later during training.
We discuss the relation of DSB to other simplicity biases and consider its implications for the principle of universality in learning.
arXiv Detail & Related papers (2022-11-21T15:27:22Z) - Robustness Certificates for Implicit Neural Networks: A Mixed Monotone
Contractive Approach [60.67748036747221]
Implicit neural networks offer competitive performance and reduced memory consumption.
They can remain brittle with respect to input adversarial perturbations.
This paper proposes a theoretical and computational framework for robustness verification of implicit neural networks.
arXiv Detail & Related papers (2021-12-10T03:08:55Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - Multi-fidelity regression using artificial neural networks: efficient
approximation of parameter-dependent output quantities [0.17499351967216337]
We present the use of artificial neural networks applied to multi-fidelity regression problems.
The introduced models are compared against a traditional multi-fidelity scheme, co-kriging.
We also show an application of multi-fidelity regression to an engineering problem.
arXiv Detail & Related papers (2021-02-26T11:29:00Z) - Ensembled sparse-input hierarchical networks for high-dimensional
datasets [8.629912408966145]
We show that dense neural networks can be a practical data analysis tool in settings with small sample sizes.
A proposed method appropriately prunes the network structure by tuning only two L1-penalty parameters.
On a collection of real-world datasets with different sizes, EASIER-net selected network architectures in a data-adaptive manner and achieved higher prediction accuracy than off-the-shelf methods on average.
arXiv Detail & Related papers (2020-05-11T02:08:53Z) - On transfer learning of neural networks using bi-fidelity data for
uncertainty propagation [0.0]
We explore the application of transfer learning techniques using training data generated from both high- and low-fidelity models.
In the former approach, a neural network model mapping the inputs to the outputs of interest is trained based on the low-fidelity data.
The high-fidelity data is then used to adapt the parameters of the upper layer(s) of the low-fidelity network, or train a simpler neural network to map the output of the low-fidelity network to that of the high-fidelity model.
arXiv Detail & Related papers (2020-02-11T15:56:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.