Related papers: The Contextual Lasso: Sparse Linear Models via Deep Neural Networks

The Contextual Lasso: Sparse Linear Models via Deep Neural Networks

URL: http://arxiv.org/abs/2302.00878v4
Date: Tue, 2 Jan 2024 06:59:55 GMT
Title: The Contextual Lasso: Sparse Linear Models via Deep Neural Networks
Authors: Ryan Thompson, Amir Dezfouli, Robert Kohn
Abstract summary: We develop a new statistical estimator that fits a sparse linear model to the explanatory features such that the sparsity pattern and coefficients vary as a function of the contextual features. An extensive suite of experiments on real and synthetic data suggests that the learned models, which remain highly transparent, can be sparser than the regular lasso.
Score: 5.607237982617641
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Sparse linear models are one of several core tools for interpretable machine learning, a field of emerging importance as predictive models permeate decision-making in many domains. Unfortunately, sparse linear models are far less flexible as functions of their input features than black-box models like deep neural networks. With this capability gap in mind, we study a not-uncommon situation where the input features dichotomize into two groups: explanatory features, which are candidates for inclusion as variables in an interpretable model, and contextual features, which select from the candidate variables and determine their effects. This dichotomy leads us to the contextual lasso, a new statistical estimator that fits a sparse linear model to the explanatory features such that the sparsity pattern and coefficients vary as a function of the contextual features. The fitting process learns this function nonparametrically via a deep neural network. To attain sparse coefficients, we train the network with a novel lasso regularizer in the form of a projection layer that maps the network's output onto the space of $\ell_1$-constrained linear models. An extensive suite of experiments on real and synthetic data suggests that the learned models, which remain highly transparent, can be sparser than the regular lasso without sacrificing the predictive power of a standard deep neural network.

Related papers

Instance-wise Linearization of Neural Network for Model Interpretation [13.583425552511704]
The challenge can dive into the non-linear behavior of the neural network. For a neural network model, the non-linear behavior is often caused by non-linear activation units of a model. We propose an instance-wise linearization approach to reformulates the forward computation process of a neural network prediction.
arXiv Detail & Related papers (2023-10-25T02:07:39Z)
Learning Active Subspaces and Discovering Important Features with Gaussian Radial Basis Functions Neural Networks [0.0]
We show that precious information is contained in the spectrum of the precision matrix that can be extracted once the training of the model is completed. We conducted numerical experiments for regression, classification, and feature selection tasks. Our results demonstrate that the proposed model does not only yield an attractive prediction performance compared to the competitors.
arXiv Detail & Related papers (2023-07-11T09:54:30Z)
ReLU Neural Networks with Linear Layers are Biased Towards Single- and Multi-Index Models [9.96121040675476]
This manuscript explores how properties of functions learned by neural networks of depth greater than two layers affect predictions. Our framework considers a family of networks of varying depths that all have the same capacity but different representation costs.
arXiv Detail & Related papers (2023-05-24T22:10:12Z)
Provable Identifiability of Two-Layer ReLU Neural Networks via LASSO Regularization [15.517787031620864]
The territory of LASSO is extended to two-layer ReLU neural networks, a fashionable and powerful nonlinear regression model. We show that the LASSO estimator can stably reconstruct the neural network and identify $mathcalSstar$ when the number of samples scales logarithmically. Our theory lies in an extended Restricted Isometry Property (RIP)-based analysis framework for two-layer ReLU neural networks.
arXiv Detail & Related papers (2023-05-07T13:05:09Z)
Learning to Learn with Generative Models of Neural Network Checkpoints [71.06722933442956]
We construct a dataset of neural network checkpoints and train a generative model on the parameters. We find that our approach successfully generates parameters for a wide range of loss prompts. We apply our method to different neural network architectures and tasks in supervised and reinforcement learning.
arXiv Detail & Related papers (2022-09-26T17:59:58Z)
Dynamically-Scaled Deep Canonical Correlation Analysis [77.34726150561087]
Canonical Correlation Analysis (CCA) is a method for feature extraction of two views by finding maximally correlated linear projections of them. We introduce a novel dynamic scaling method for training an input-dependent canonical correlation model.
arXiv Detail & Related papers (2022-03-23T12:52:49Z)
Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs. By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z)
It's FLAN time! Summing feature-wise latent representations for interpretability [0.0]
We propose a novel class of structurally-constrained neural networks, which we call FLANs (Feature-wise Latent Additive Networks) FLANs process each input feature separately, computing for each of them a representation in a common latent space. These feature-wise latent representations are then simply summed, and the aggregated representation is used for prediction.
arXiv Detail & Related papers (2021-06-18T12:19:33Z)
Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks. This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z)
A Bayesian Perspective on Training Speed and Model Selection [51.15664724311443]
We show that a measure of a model's training speed can be used to estimate its marginal likelihood. We verify our results in model selection tasks for linear models and for the infinite-width limit of deep neural networks. Our results suggest a promising new direction towards explaining why neural networks trained with gradient descent are biased towards functions that generalize well.
arXiv Detail & Related papers (2020-10-27T17:56:14Z)
Model Fusion via Optimal Transport [64.13185244219353]
We present a layer-wise model fusion algorithm for neural networks. We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
arXiv Detail & Related papers (2019-10-12T22:07:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.