Coresets for Regressions with Panel Data
- URL: http://arxiv.org/abs/2011.00981v2
- Date: Tue, 3 Nov 2020 02:52:41 GMT
- Title: Coresets for Regressions with Panel Data
- Authors: Lingxiao Huang, K. Sudhir, Nisheeth K. Vishnoi
- Abstract summary: We first define coresets for regression problems with panel data and then present efficient algorithms to construct coresets.
Our approach is based on the Feldman-Langberg framework in which a key step is to upper bound the "total sensitivity"
Empirically, we assess our approach with real-world datasets.
- Score: 29.910677117943568
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper introduces the problem of coresets for regression problems to
panel data settings. We first define coresets for several variants of
regression problems with panel data and then present efficient algorithms to
construct coresets of size that depend polynomially on 1/$\varepsilon$ (where
$\varepsilon$ is the error parameter) and the number of regression parameters -
independent of the number of individuals in the panel data or the time units
each individual is observed for. Our approach is based on the Feldman-Langberg
framework in which a key step is to upper bound the "total sensitivity" that is
roughly the sum of maximum influences of all individual-time pairs taken over
all possible choices of regression parameters. Empirically, we assess our
approach with synthetic and real-world datasets; the coreset sizes constructed
using our approach are much smaller than the full dataset and coresets indeed
accelerate the running time of computing the regression objective.
Related papers
- Accurate Coresets for Latent Variable Models and Regularized Regression [1.9567015559455132]
We introduce a unified framework for constructing accurate coresets.
We present accurate coreset construction algorithms for general problems.
We substantiate our theoretical findings with extensive experimental evaluations on real datasets.
arXiv Detail & Related papers (2024-12-28T16:01:49Z) - Minimally Supervised Learning using Topological Projections in
Self-Organizing Maps [55.31182147885694]
We introduce a semi-supervised learning approach based on topological projections in self-organizing maps (SOMs)
Our proposed method first trains SOMs on unlabeled data and then a minimal number of available labeled data points are assigned to key best matching units (BMU)
Our results indicate that the proposed minimally supervised model significantly outperforms traditional regression techniques.
arXiv Detail & Related papers (2024-01-12T22:51:48Z) - Polynomial-Time Solutions for ReLU Network Training: A Complexity
Classification via Max-Cut and Zonotopes [70.52097560486683]
We prove that the hardness of approximation of ReLU networks not only mirrors the complexity of the Max-Cut problem but also, in certain special cases, exactly corresponds to it.
In particular, when $epsilonleqsqrt84/83-1approx 0.006$, we show that it is NP-hard to find an approximate global dataset of the ReLU network objective with relative error $epsilon$ with respect to the objective value.
arXiv Detail & Related papers (2023-11-18T04:41:07Z) - Function Space Bayesian Pseudocoreset for Bayesian Neural Networks [16.952160718249292]
A Bayesian pseudocoreset is a compact synthetic dataset summarizing essential information of a large-scale dataset.
In this paper, we propose a novel Bayesian pseudocoreset construction method that operates on a function space.
By working directly on the function space, our method could bypass several challenges that may arise when working on a weight space.
arXiv Detail & Related papers (2023-10-27T02:04:31Z) - Large-scale Fully-Unsupervised Re-Identification [78.47108158030213]
We propose two strategies to learn from large-scale unlabeled data.
The first strategy performs a local neighborhood sampling to reduce the dataset size in each without violating neighborhood relationships.
A second strategy leverages a novel Re-Ranking technique, which has a lower time upper bound complexity and reduces the memory complexity from O(n2) to O(kn) with k n.
arXiv Detail & Related papers (2023-07-26T16:19:19Z) - Shuffled Autoregression For Motion Interpolation [53.61556200049156]
This work aims to provide a deep-learning solution for the motion task.
We propose a novel framework, referred to as emphShuffled AutoRegression, which expands the autoregression to generate in arbitrary (shuffled) order.
We also propose an approach to constructing a particular kind of dependency graph, with three stages assembled into an end-to-end spatial-temporal motion Transformer.
arXiv Detail & Related papers (2023-06-10T07:14:59Z) - Easy Differentially Private Linear Regression [16.325734286930764]
We study an algorithm which uses the exponential mechanism to select a model with high Tukey depth from a collection of non-private regression models.
We find that this algorithm obtains strong empirical performance in the data-rich setting.
arXiv Detail & Related papers (2022-08-15T17:42:27Z) - Coresets for Time Series Clustering [33.801060211529354]
We study the problem of constructing coresets for clustering problems with time series data.
Our main contribution is an algorithm to construct coresets for a mixture model.
We empirically assess the performance of our coreset with synthetic data.
arXiv Detail & Related papers (2021-10-28T16:21:13Z) - High-dimensional regression with potential prior information on variable
importance [0.0]
We propose a simple scheme involving fitting a sequence of models indicated by the ordering.
We show that the computational cost for fitting all models when ridge regression is used is no more than for a single fit of ridge regression.
We describe a strategy for Lasso regression that makes use of previous fits to greatly speed up fitting the entire sequence of models.
arXiv Detail & Related papers (2021-09-23T10:34:37Z) - Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression [81.05772887221333]
We study the dense keypoint regression framework that is previously inferior to the keypoint detection and grouping framework.
We present a simple yet effective approach, named disentangled keypoint regression (DEKR)
We empirically show that the proposed direct regression method outperforms keypoint detection and grouping methods.
arXiv Detail & Related papers (2021-04-06T05:54:46Z) - Least Squares Regression with Markovian Data: Fundamental Limits and
Algorithms [69.45237691598774]
We study the problem of least squares linear regression where the data-points are dependent and are sampled from a Markov chain.
We establish sharp information theoretic minimax lower bounds for this problem in terms of $tau_mathsfmix$.
We propose an algorithm based on experience replay--a popular reinforcement learning technique--that achieves a significantly better error rate.
arXiv Detail & Related papers (2020-06-16T04:26:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.