Towards Practical Robustness Auditing for Linear Regression
- URL: http://arxiv.org/abs/2307.16315v1
- Date: Sun, 30 Jul 2023 20:47:36 GMT
- Title: Towards Practical Robustness Auditing for Linear Regression
- Authors: Daniel Freund and Samuel B. Hopkins
- Abstract summary: We investigate algorithms to find or disprove the existence of small subsets of a dataset.
We show that these methods largely outperform the state of the art and provide a useful robustness check for regression problems in a few dimensions.
We make some headway on this challenge via a spectral algorithm using ideas drawn from recent innovations in algorithmic robust statistics.
- Score: 8.9598796481325
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We investigate practical algorithms to find or disprove the existence of
small subsets of a dataset which, when removed, reverse the sign of a
coefficient in an ordinary least squares regression involving that dataset. We
empirically study the performance of well-established algorithmic techniques
for this task -- mixed integer quadratically constrained optimization for
general linear regression problems and exact greedy methods for special cases.
We show that these methods largely outperform the state of the art and provide
a useful robustness check for regression problems in a few dimensions. However,
significant computational bottlenecks remain, especially for the important task
of disproving the existence of such small sets of influential samples for
regression problems of dimension $3$ or greater. We make some headway on this
challenge via a spectral algorithm using ideas drawn from recent innovations in
algorithmic robust statistics. We summarize the limitations of known techniques
in several challenge datasets to encourage further algorithmic innovation.
Related papers
- A Sample Efficient Alternating Minimization-based Algorithm For Robust Phase Retrieval [56.67706781191521]
In this work, we present a robust phase retrieval problem where the task is to recover an unknown signal.
Our proposed oracle avoids the need for computationally spectral descent, using a simple gradient step and outliers.
arXiv Detail & Related papers (2024-09-07T06:37:23Z) - A Guide to Stochastic Optimisation for Large-Scale Inverse Problems [4.926711494319977]
optimisation algorithms are the de facto standard for machine learning with large amounts of data.
We provide a comprehensive account of the state-of-the-art in optimisation from the viewpoint of inverse problems.
We focus on the challenges for optimisation that are unique and are not commonly encountered in machine learning.
arXiv Detail & Related papers (2024-06-10T15:02:30Z) - Geometry-Aware Approaches for Balancing Performance and Theoretical
Guarantees in Linear Bandits [6.907555940790131]
Thompson sampling and Greedy demonstrate promising empirical performance, yet this contrasts with their pessimistic theoretical regret bounds.
We propose a new data-driven technique that tracks the geometric properties of the uncertainty ellipsoid.
We identify and course-correct" problem instances in which the base algorithms perform poorly.
arXiv Detail & Related papers (2023-06-26T17:38:45Z) - A Bayesian Robust Regression Method for Corrupted Data Reconstruction [5.298637115178182]
We develop an effective robust regression method that can resist adaptive adversarial attacks.
First, we propose the novel TRIP (hard Thresholding approach to Robust regression with sImple Prior) algorithm.
We then use the idea of Bayesian reweighting to construct the more robust BRHT (robust Bayesian Reweighting regression via Hard Thresholding) algorithm.
arXiv Detail & Related papers (2022-12-24T17:25:53Z) - Vector-Valued Least-Squares Regression under Output Regularity
Assumptions [73.99064151691597]
We propose and analyse a reduced-rank method for solving least-squares regression problems with infinite dimensional output.
We derive learning bounds for our method, and study under which setting statistical performance is improved in comparison to full-rank method.
arXiv Detail & Related papers (2022-11-16T15:07:00Z) - A Data-Driven Line Search Rule for Support Recovery in High-dimensional
Data Analysis [5.180648702293017]
We propose a novel and efficient data-driven line search rule to adaptively determine the appropriate step size.
A large number of comparisons with state-of-the-art algorithms in linear and logistic regression problems show the stability, effectiveness and superiority of the proposed algorithms.
arXiv Detail & Related papers (2021-11-21T12:18:18Z) - Quantum-Inspired Algorithms from Randomized Numerical Linear Algebra [53.46106569419296]
We create classical (non-quantum) dynamic data structures supporting queries for recommender systems and least-squares regression.
We argue that the previous quantum-inspired algorithms for these problems are doing leverage or ridge-leverage score sampling in disguise.
arXiv Detail & Related papers (2020-11-09T01:13:07Z) - Variance-Reduced Off-Policy Memory-Efficient Policy Search [61.23789485979057]
Off-policy policy optimization is a challenging problem in reinforcement learning.
Off-policy algorithms are memory-efficient and capable of learning from off-policy samples.
arXiv Detail & Related papers (2020-09-14T16:22:46Z) - A spectral algorithm for robust regression with subgaussian rates [0.0]
We study a new linear up to quadratic time algorithm for linear regression in the absence of strong assumptions on the underlying distributions of samples.
The goal is to design a procedure which attains the optimal sub-gaussian error bound even though the data have only finite moments.
arXiv Detail & Related papers (2020-07-12T19:33:50Z) - Fast OSCAR and OWL Regression via Safe Screening Rules [97.28167655721766]
Ordered $L_1$ (OWL) regularized regression is a new regression analysis for high-dimensional sparse learning.
Proximal gradient methods are used as standard approaches to solve OWL regression.
We propose the first safe screening rule for OWL regression by exploring the order of the primal solution with the unknown order structure.
arXiv Detail & Related papers (2020-06-29T23:35:53Z) - Effective Dimension Adaptive Sketching Methods for Faster Regularized
Least-Squares Optimization [56.05635751529922]
We propose a new randomized algorithm for solving L2-regularized least-squares problems based on sketching.
We consider two of the most popular random embeddings, namely, Gaussian embeddings and the Subsampled Randomized Hadamard Transform (SRHT)
arXiv Detail & Related papers (2020-06-10T15:00:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.