Eigen-Stratified Models
- URL: http://arxiv.org/abs/2001.10389v1
- Date: Mon, 27 Jan 2020 16:26:08 GMT
- Title: Eigen-Stratified Models
- Authors: Jonathan Tuck, Stephen Boyd
- Abstract summary: Stratified models depend in an arbitrary way on a selected categorical feature that takes $K$ values, and depend linearly on the other $n$ features.
Laplacian regularization with respect to a graph on the feature values can greatly improve the performance of a stratified model.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Stratified models depend in an arbitrary way on a selected categorical
feature that takes $K$ values, and depend linearly on the other $n$ features.
Laplacian regularization with respect to a graph on the feature values can
greatly improve the performance of a stratified model, especially in the
low-data regime. A significant issue with Laplacian-regularized stratified
models is that the model is $K$ times the size of the base model, which can be
quite large.
We address this issue by formulating eigen-stratifed models, which are
stratified models with an additional constraint that the model parameters are
linear combinations of some modest number $m$ of bottom eigenvectors of the
graph Laplacian, i.e., those associated with the $m$ smallest eigenvalues. With
eigen-stratified models, we only need to store the $m$ bottom eigenvectors and
the corresponding coefficients as the stratified model parameters. This leads
to a reduction, sometimes large, of model size when $m \leq n$ and $m \ll K$.
In some cases, the additional regularization implicit in eigen-stratified
models can improve out-of-sample performance over standard Laplacian
regularized stratified models.
Related papers
- EMR-Merging: Tuning-Free High-Performance Model Merging [55.03509900949149]
We show that Elect, Mask & Rescale-Merging (EMR-Merging) shows outstanding performance compared to existing merging methods.
EMR-Merging is tuning-free, thus requiring no data availability or any additional training while showing impressive performance.
arXiv Detail & Related papers (2024-05-23T05:25:45Z) - Scaling and renormalization in high-dimensional regression [72.59731158970894]
This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models.
We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning.
arXiv Detail & Related papers (2024-05-01T15:59:00Z) - Continuous Language Model Interpolation for Dynamic and Controllable Text Generation [7.535219325248997]
We focus on the challenging case where the model must dynamically adapt to diverse -- and often changing -- user preferences.
We leverage adaptation methods based on linear weight, casting them as continuous multi-domain interpolators.
We show that varying the weights yields predictable and consistent change in the model outputs.
arXiv Detail & Related papers (2024-04-10T15:55:07Z) - An Analysis of Linear Time Series Forecasting Models [0.0]
We show that several popular variants of linear models for time series forecasting are equivalent and functionally indistinguishable from standard, unconstrained linear regression.
We provide experimental evidence that the models under inspection learn nearly identical solutions, and finally demonstrate that the simpler closed form solutions are superior forecasters across 72% of test settings.
arXiv Detail & Related papers (2024-03-21T17:42:45Z) - $p$-Generalized Probit Regression and Scalable Maximum Likelihood
Estimation via Sketching and Coresets [74.37849422071206]
We study the $p$-generalized probit regression model, which is a generalized linear model for binary responses.
We show how the maximum likelihood estimator for $p$-generalized probit regression can be approximated efficiently up to a factor of $(1+varepsilon)$ on large data.
arXiv Detail & Related papers (2022-03-25T10:54:41Z) - Universal and data-adaptive algorithms for model selection in linear
contextual bandits [52.47796554359261]
We consider the simplest non-trivial instance of model-selection: distinguishing a simple multi-armed bandit problem from a linear contextual bandit problem.
We introduce new algorithms that explore in a data-adaptive manner and provide guarantees of the form $mathcalO(dalpha T1- alpha)$.
Our approach extends to model selection among nested linear contextual bandits under some additional assumptions.
arXiv Detail & Related papers (2021-11-08T18:05:35Z) - Spike-and-Slab Generalized Additive Models and Scalable Algorithms for
High-Dimensional Data [0.0]
We propose hierarchical generalized additive models (GAMs) to accommodate high-dimensional data.
We consider the smoothing penalty for proper shrinkage of curve and separation of smoothing function linear and nonlinear spaces.
Two and deterministic algorithms, EM-Coordinate Descent and EM-Iterative Weighted Least Squares, are developed for different utilities.
arXiv Detail & Related papers (2021-10-27T14:11:13Z) - On Model Selection Consistency of Lasso for High-Dimensional Ising
Models on Tree-like Graphs [13.14903445595385]
We consider the problem of high-dimensional Ising model selection using neighborhood-based least absolute shrinkage and selection operator (Lasso)
It is rigorously proved that consistent model selection can be achieved with sample sizes $n=Omega(d3logp)$ for any tree-like graph in the paramagnetic phase.
Given the popularity and efficiency of Lasso, our rigorous analysis provides a theoretical backing for its practical use in Ising model selection.
arXiv Detail & Related papers (2021-10-16T07:23:02Z) - Goal-directed Generation of Discrete Structures with Conditional
Generative Models [85.51463588099556]
We introduce a novel approach to directly optimize a reinforcement learning objective, maximizing an expected reward.
We test our methodology on two tasks: generating molecules with user-defined properties and identifying short python expressions which evaluate to a given target value.
arXiv Detail & Related papers (2020-10-05T20:03:13Z) - LowFER: Low-rank Bilinear Pooling for Link Prediction [4.110108749051657]
We propose a factorized bilinear pooling model, commonly used in multi-modal learning, for better fusion of entities and relations.
Our model naturally generalizes decomposition Tucker based TuckER model, which has been shown to generalize other models.
We evaluate on real-world datasets, reaching on par or state-of-the-art performance.
arXiv Detail & Related papers (2020-08-25T07:33:52Z) - Learning Gaussian Graphical Models via Multiplicative Weights [54.252053139374205]
We adapt an algorithm of Klivans and Meka based on the method of multiplicative weight updates.
The algorithm enjoys a sample complexity bound that is qualitatively similar to others in the literature.
It has a low runtime $O(mp2)$ in the case of $m$ samples and $p$ nodes, and can trivially be implemented in an online manner.
arXiv Detail & Related papers (2020-02-20T10:50:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.