SHRIMP: Sparser Random Feature Models via Iterative Magnitude Pruning
- URL: http://arxiv.org/abs/2112.04002v1
- Date: Tue, 7 Dec 2021 21:32:28 GMT
- Title: SHRIMP: Sparser Random Feature Models via Iterative Magnitude Pruning
- Authors: Yuege Xie, Bobby Shi, Hayden Schaeffer, Rachel Ward
- Abstract summary: We propose a new method -- Sparser Random Feature Models via IMP (ShRIMP) -- to efficiently fit high-dimensional data with inherent low-dimensional structure.
Our method can be viewed as a combined process to construct and find sparse lottery tickets for two-layer dense networks.
- Score: 3.775565013663731
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sparse shrunk additive models and sparse random feature models have been
developed separately as methods to learn low-order functions, where there are
few interactions between variables, but neither offers computational
efficiency. On the other hand, $\ell_2$-based shrunk additive models are
efficient but do not offer feature selection as the resulting coefficient
vectors are dense. Inspired by the success of the iterative magnitude pruning
technique in finding lottery tickets of neural networks, we propose a new
method -- Sparser Random Feature Models via IMP (ShRIMP) -- to efficiently fit
high-dimensional data with inherent low-dimensional structure in the form of
sparse variable dependencies. Our method can be viewed as a combined process to
construct and find sparse lottery tickets for two-layer dense networks. We
explain the observed benefit of SHRIMP through a refined analysis on the
generalization error for thresholded Basis Pursuit and resulting bounds on
eigenvalues.
From function approximation experiments on both synthetic data and real-world
benchmark datasets, we show that SHRIMP obtains better than or competitive test
accuracy compared to state-of-art sparse feature and additive methods such as
SRFE-S, SSAM, and SALSA. Meanwhile, SHRIMP performs feature selection with low
computational complexity and is robust to the pruning rate, indicating a
robustness in the structure of the obtained subnetworks. We gain insight into
the lottery ticket hypothesis through SHRIMP by noting a correspondence between
our model and weight/neuron subnetworks.
Related papers
- Nonuniform random feature models using derivative information [10.239175197655266]
We propose nonuniform data-driven parameter distributions for neural network initialization based on derivative data of the function to be approximated.
We address the cases of Heaviside and ReLU activation functions, and their smooth approximations (sigmoid and softplus)
We suggest simplifications of these exact densities based on approximate derivative data in the input points that allow for very efficient sampling and lead to performance of random feature models close to optimal networks in several scenarios.
arXiv Detail & Related papers (2024-10-03T01:30:13Z) - On the Trajectory Regularity of ODE-based Diffusion Sampling [79.17334230868693]
Diffusion-based generative models use differential equations to establish a smooth connection between a complex data distribution and a tractable prior distribution.
In this paper, we identify several intriguing trajectory properties in the ODE-based sampling process of diffusion models.
arXiv Detail & Related papers (2024-05-18T15:59:41Z) - Latent Semantic Consensus For Deterministic Geometric Model Fitting [109.44565542031384]
We propose an effective method called Latent Semantic Consensus (LSC)
LSC formulates the model fitting problem into two latent semantic spaces based on data points and model hypotheses.
LSC is able to provide consistent and reliable solutions within only a few milliseconds for general multi-structural model fitting.
arXiv Detail & Related papers (2024-03-11T05:35:38Z) - Computational-Statistical Gaps in Gaussian Single-Index Models [77.1473134227844]
Single-Index Models are high-dimensional regression problems with planted structure.
We show that computationally efficient algorithms, both within the Statistical Query (SQ) and the Low-Degree Polynomial (LDP) framework, necessarily require $Omega(dkstar/2)$ samples.
arXiv Detail & Related papers (2024-03-08T18:50:19Z) - Fast Shapley Value Estimation: A Unified Approach [71.92014859992263]
We propose a straightforward and efficient Shapley estimator, SimSHAP, by eliminating redundant techniques.
In our analysis of existing approaches, we observe that estimators can be unified as a linear transformation of randomly summed values from feature subsets.
Our experiments validate the effectiveness of our SimSHAP, which significantly accelerates the computation of accurate Shapley values.
arXiv Detail & Related papers (2023-11-02T06:09:24Z) - Impact of PolSAR pre-processing and balancing methods on complex-valued
neural networks segmentation tasks [9.6556424340252]
We investigate the semantic segmentation of Polarimetric Synthetic Aperture Radar (PolSAR) using Complex-Valued Neural Network (CVNN)
We exhaustively compare both methods for six model architectures, three complex-valued, and their respective real-equivalent models.
We propose two methods for reducing this gap and performing the results for all input representations, models, and dataset pre-processing.
arXiv Detail & Related papers (2022-10-28T12:49:43Z) - Fast variable selection makes scalable Gaussian process BSS-ANOVA a
speedy and accurate choice for tabular and time series regression [0.0]
Gaussian processes (GPs) are non-parametric regression engines with a long history.
One of a number of scalable GP approaches is the Karhunen-Lo'eve (KL) decomposed kernel BSS-ANOVA, developed in 2009.
A new method of forward variable selection, quickly and effectively limits the number of terms, yielding a method with competitive accuracies.
arXiv Detail & Related papers (2022-05-26T23:41:43Z) - Quantum-Assisted Feature Selection for Vehicle Price Prediction Modeling [0.0]
We study metrics for encoding the search as a binary model, such as Generalized Mean Information Coefficient and Pearson Correlation Coefficient.
We achieve accuracy scores of 0.9 for finding optimal subsets on synthetic data using a new metric that we define.
Our findings show that by leveraging quantum-assisted routines we find solutions that increase the quality of predictive model output.
arXiv Detail & Related papers (2021-04-08T20:48:44Z) - Robust Finite Mixture Regression for Heterogeneous Targets [70.19798470463378]
We propose an FMR model that finds sample clusters and jointly models multiple incomplete mixed-type targets simultaneously.
We provide non-asymptotic oracle performance bounds for our model under a high-dimensional learning framework.
The results show that our model can achieve state-of-the-art performance.
arXiv Detail & Related papers (2020-10-12T03:27:07Z) - Deep Learning with Functional Inputs [0.0]
We present a methodology for integrating functional data into feed-forward neural networks.
A by-product of the method is a set of dynamic functional weights that can be visualized during the optimization process.
The model is shown to perform well in a number of contexts including prediction of new data and recovery of the true underlying functional weights.
arXiv Detail & Related papers (2020-06-17T01:23:00Z) - Supervised Learning for Non-Sequential Data: A Canonical Polyadic
Decomposition Approach [85.12934750565971]
Efficient modelling of feature interactions underpins supervised learning for non-sequential tasks.
To alleviate this issue, it has been proposed to implicitly represent the model parameters as a tensor.
For enhanced expressiveness, we generalize the framework to allow feature mapping to arbitrarily high-dimensional feature vectors.
arXiv Detail & Related papers (2020-01-27T22:38:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.