HiPaR: Hierarchical Pattern-aided Regression
- URL: http://arxiv.org/abs/2102.12370v1
- Date: Wed, 24 Feb 2021 15:53:17 GMT
- Title: HiPaR: Hierarchical Pattern-aided Regression
- Authors: Luis Gal\'arraga and Olivier Pelgrin and Alexandre Termier
- Abstract summary: HiPaR mines hybrid rules of the form $p Rightarrow y = f(X)$ where $p$ is the characterization of a data region and $f(X)$ is a linear regression model on a variable of interest $y$.
HiPaR relies on pattern mining techniques to identify regions of the data where the target variable can be accurately explained via local linear models.
As our experiments shows, HiPaR mines fewer rules than existing pattern-based regression methods while still attaining state-of-the-art prediction performance.
- Score: 71.22664057305572
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce HiPaR, a novel pattern-aided regression method for tabular data
containing both categorical and numerical attributes. HiPaR mines hybrid rules
of the form $p \Rightarrow y = f(X)$ where $p$ is the characterization of a
data region and $f(X)$ is a linear regression model on a variable of interest
$y$. HiPaR relies on pattern mining techniques to identify regions of the data
where the target variable can be accurately explained via local linear models.
The novelty of the method lies in the combination of an enumerative approach to
explore the space of regions and efficient heuristics that guide the search.
Such a strategy provides more flexibility when selecting a small set of jointly
accurate and human-readable hybrid rules that explain the entire dataset. As
our experiments shows, HiPaR mines fewer rules than existing pattern-based
regression methods while still attaining state-of-the-art prediction
performance.
Related papers
- Highly Adaptive Ridge [84.38107748875144]
We propose a regression method that achieves a $n-2/3$ dimension-free L2 convergence rate in the class of right-continuous functions with square-integrable sectional derivatives.
Har is exactly kernel ridge regression with a specific data-adaptive kernel based on a saturated zero-order tensor-product spline basis expansion.
We demonstrate empirical performance better than state-of-the-art algorithms for small datasets in particular.
arXiv Detail & Related papers (2024-10-03T17:06:06Z) - LFFR: Logistic Function For (single-output) Regression [0.0]
We implement privacy-preserving regression training using data encrypted under a fully homomorphic encryption scheme.
We develop a novel and efficient algorithm called LFFR for homomorphic regression using the logistic function.
arXiv Detail & Related papers (2024-07-13T17:33:49Z) - Data-Efficient Learning via Clustering-Based Sensitivity Sampling:
Foundation Models and Beyond [28.651041302245538]
We present a new data selection approach based on $k$-means clustering and sampling sensitivity.
We show how it can be applied on linear regression, leading to a new sampling strategy that surprisingly matches the performances of leverage score sampling.
arXiv Detail & Related papers (2024-02-27T09:03:43Z) - PRIMO: Private Regression in Multiple Outcomes [2.900810893770134]
We introduce a new private regression setting we call Private Regression in Multiple Outcomes (PRIMO)
PRIMO is inspired by the common situation where a data analyst wants to perform a set of $l$ regressions while preserving privacy.
We find that even for values of $l$ far smaller than the theory would predict, our projection-based method improves the accuracy relative to the variant that doesn't use the projection.
arXiv Detail & Related papers (2023-03-07T19:32:13Z) - Easy Differentially Private Linear Regression [16.325734286930764]
We study an algorithm which uses the exponential mechanism to select a model with high Tukey depth from a collection of non-private regression models.
We find that this algorithm obtains strong empirical performance in the data-rich setting.
arXiv Detail & Related papers (2022-08-15T17:42:27Z) - $p$-Generalized Probit Regression and Scalable Maximum Likelihood
Estimation via Sketching and Coresets [74.37849422071206]
We study the $p$-generalized probit regression model, which is a generalized linear model for binary responses.
We show how the maximum likelihood estimator for $p$-generalized probit regression can be approximated efficiently up to a factor of $(1+varepsilon)$ on large data.
arXiv Detail & Related papers (2022-03-25T10:54:41Z) - Universal and data-adaptive algorithms for model selection in linear
contextual bandits [52.47796554359261]
We consider the simplest non-trivial instance of model-selection: distinguishing a simple multi-armed bandit problem from a linear contextual bandit problem.
We introduce new algorithms that explore in a data-adaptive manner and provide guarantees of the form $mathcalO(dalpha T1- alpha)$.
Our approach extends to model selection among nested linear contextual bandits under some additional assumptions.
arXiv Detail & Related papers (2021-11-08T18:05:35Z) - Gaussian Process Model for Estimating Piecewise Continuous Regression
Functions [2.132096006921048]
Gaussian process (GP) model for estimating piecewise continuous regression functions.
New GP model seeks for a local GP estimate of an unknown piecewise continuous regression function at each test location.
arXiv Detail & Related papers (2021-04-13T20:01:43Z) - Nearly Dimension-Independent Sparse Linear Bandit over Small Action
Spaces via Best Subset Selection [71.9765117768556]
We consider the contextual bandit problem under the high dimensional linear model.
This setting finds essential applications such as personalized recommendation, online advertisement, and personalized medicine.
We propose doubly growing epochs and estimating the parameter using the best subset selection method.
arXiv Detail & Related papers (2020-09-04T04:10:39Z) - Minimum discrepancy principle strategy for choosing $k$ in $k$-NN regression [2.0411082897313984]
We present a novel data-driven strategy to choose the hyper parameter $k$ in the $k$-NN regression estimator without using any hold-out data.
We propose using an easily implemented in practice strategy based on the idea of early stopping and the minimum discrepancy principle.
arXiv Detail & Related papers (2020-08-20T00:13:19Z) - Piecewise Linear Regression via a Difference of Convex Functions [50.89452535187813]
We present a new piecewise linear regression methodology that utilizes fitting a difference of convex functions (DC functions) to the data.
We empirically validate the method, showing it to be practically implementable, and to have comparable performance to existing regression/classification methods on real-world datasets.
arXiv Detail & Related papers (2020-07-05T18:58:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.