Sparse Neural Additive Model: Interpretable Deep Learning with Feature
Selection via Group Sparsity
- URL: http://arxiv.org/abs/2202.12482v1
- Date: Fri, 25 Feb 2022 03:40:53 GMT
- Title: Sparse Neural Additive Model: Interpretable Deep Learning with Feature
Selection via Group Sparsity
- Authors: Shiyun Xu, Zhiqi Bu, Pratik Chaudhari, Ian J. Barnett
- Abstract summary: We study the theoretical properties for neural additive models (SNAM) with novel techniques to tackle the non-parametric truth.
We prove that SNAM can have exact support recovery, i.e. perfect feature selection, with appropriate regularization.
We further testify to the good accuracy and efficiency of SNAM.
- Score: 22.752160137192156
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Interpretable machine learning has demonstrated impressive performance while
preserving explainability. In particular, neural additive models (NAM) offer
the interpretability to the black-box deep learning and achieve
state-of-the-art accuracy among the large family of generalized additive
models. In order to empower NAM with feature selection and improve the
generalization, we propose the sparse neural additive models (SNAM) that employ
the group sparsity regularization (e.g. Group LASSO), where each feature is
learned by a sub-network whose trainable parameters are clustered as a group.
We study the theoretical properties for SNAM with novel techniques to tackle
the non-parametric truth, thus extending from classical sparse linear models
such as the LASSO, which only works on the parametric truth.
Specifically, we show that SNAM with subgradient and proximal gradient
descents provably converges to zero training loss as $t\to\infty$, and that the
estimation error of SNAM vanishes asymptotically as $n\to\infty$. We also prove
that SNAM, similar to LASSO, can have exact support recovery, i.e. perfect
feature selection, with appropriate regularization. Moreover, we show that the
SNAM can generalize well and preserve the `identifiability', recovering each
feature's effect. We validate our theories via extensive experiments and
further testify to the good accuracy and efficiency of SNAM.
Related papers
- Differentiable Neural-Integrated Meshfree Method for Forward and Inverse Modeling of Finite Strain Hyperelasticity [1.290382979353427]
The present study aims to extend the novel physics-informed machine learning approach, specifically the neural-integrated meshfree (NIM) method, to model finite-strain problems.
Thanks to the inherent differentiable programming capabilities, NIM can circumvent the need for derivation of Newton-Raphson linearization of the variational form.
NIM is applied to identify heterogeneous mechanical properties of hyperelastic materials from strain data, validating its effectiveness in the inverse modeling of nonlinear materials.
arXiv Detail & Related papers (2024-07-15T19:15:18Z) - Gaussian Process Neural Additive Models [3.7969209746164325]
We propose a new subclass of Neural Additive Models (NAMs) that use a single-layer neural network construction of the Gaussian process via random Fourier features.
GP-NAMs have the advantage of a convex objective function and number of trainable parameters that grows linearly with feature dimensionality.
We show that GP-NAM achieves comparable or better performance in both classification and regression tasks with a large reduction in the number of parameters.
arXiv Detail & Related papers (2024-02-19T20:29:34Z) - Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z) - Theoretical Characterization of the Generalization Performance of
Overfitted Meta-Learning [70.52689048213398]
This paper studies the performance of overfitted meta-learning under a linear regression model with Gaussian features.
We find new and interesting properties that do not exist in single-task linear regression.
Our analysis suggests that benign overfitting is more significant and easier to observe when the noise and the diversity/fluctuation of the ground truth of each training task are large.
arXiv Detail & Related papers (2023-04-09T20:36:13Z) - Learning Low Dimensional State Spaces with Overparameterized Recurrent
Neural Nets [57.06026574261203]
We provide theoretical evidence for learning low-dimensional state spaces, which can also model long-term memory.
Experiments corroborate our theory, demonstrating extrapolation via learning low-dimensional state spaces with both linear and non-linear RNNs.
arXiv Detail & Related papers (2022-10-25T14:45:15Z) - Deep Contrastive Graph Representation via Adaptive Homotopy Learning [76.22904270821778]
Homotopy model is an excellent tool exploited by diverse research works in the field of machine learning.
We propose a novel adaptive homotopy framework (AH) in which the Maclaurin duality is employed.
AH can be widely utilized to enhance the homotopy-based algorithm.
arXiv Detail & Related papers (2021-06-17T04:46:04Z) - SurvNAM: The machine learning survival model explanation [5.8010446129208155]
SurvNAM is proposed to explain predictions of the black-box machine learning survival model.
The basic idea behind SurvNAM is to train the network by means of a specific expected loss function.
The proposed modifications of SurvNAM are based on using the Lasso-based regularization for functions from GAM.
arXiv Detail & Related papers (2021-04-18T16:40:56Z) - Provably Efficient Neural Estimation of Structural Equation Model: An
Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs)
We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent.
For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z) - Neural Additive Models: Interpretable Machine Learning with Neural Nets [77.66871378302774]
Deep neural networks (DNNs) are powerful black-box predictors that have achieved impressive performance on a wide variety of tasks.
We propose Neural Additive Models (NAMs) which combine some of the expressivity of DNNs with the inherent intelligibility of generalized additive models.
NAMs learn a linear combination of neural networks that each attend to a single input feature.
arXiv Detail & Related papers (2020-04-29T01:28:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.