Related papers: Testing the Robustness of Learned Index Structures

Testing the Robustness of Learned Index Structures

URL: http://arxiv.org/abs/2207.11575v1
Date: Sat, 23 Jul 2022 18:44:54 GMT
Title: Testing the Robustness of Learned Index Structures
Authors: Matthias Bachfischer, Renata Borovica-Gajic, Benjamin I. P. Rubinstein
Abstract summary: This work evaluates the robustness of learned index structures in the presence of adversarial workloads. To simulate adversarial workloads, we carry out a data poisoning attack on linear regression models. We show that learned index structures can suffer from a significant performance deterioration of up to 20% when evaluated on poisoned vs. non-poisoned datasets.
Score: 15.472214703318805
License: http://creativecommons.org/licenses/by/4.0/
Abstract: While early empirical evidence has supported the case for learned index structures as having favourable average-case performance, little is known about their worst-case performance. By contrast, classical structures are known to achieve optimal worst-case behaviour. This work evaluates the robustness of learned index structures in the presence of adversarial workloads. To simulate adversarial workloads, we carry out a data poisoning attack on linear regression models that manipulates the cumulative distribution function (CDF) on which the learned index model is trained. The attack deteriorates the fit of the underlying ML model by injecting a set of poisoning keys into the training dataset, which leads to an increase in the prediction error of the model and thus deteriorates the overall performance of the learned index structure. We assess the performance of various regression methods and the learned index implementations ALEX and PGM-Index. We show that learned index structures can suffer from a significant performance deterioration of up to 20% when evaluated on poisoned vs. non-poisoned datasets.

Related papers

Generating Poisoning Attacks against Ridge Regression Models with Categorical Features [0.0]
Machine Learning (ML) models have become a very powerful tool to extract information from large datasets. ML models can be vulnerable to external attacks, causing them to underperform or deviate from their expected tasks. In this paper, we propose to generate strong attacks for a ridge regression model containing both numerical categorical features that explicitly poisons categorical features.
arXiv Detail & Related papers (2025-01-13T12:40:52Z)
Exploiting the Data Gap: Utilizing Non-ignorable Missingness to Manipulate Model Learning [13.797822374912773]
Adversarial Missingness (AM) attacks are motivated by maliciously engineering non-ignorable missingness mechanisms. In this work we focus on associational learning in the context of AM attacks. We formulate the learning of the adversarial missingness mechanism as a bi-level optimization.
arXiv Detail & Related papers (2024-09-06T17:10:28Z)
Structured Radial Basis Function Network: Modelling Diversity for Multiple Hypotheses Prediction [51.82628081279621]
Multi-modal regression is important in forecasting nonstationary processes or with a complex mixture of distributions. A Structured Radial Basis Function Network is presented as an ensemble of multiple hypotheses predictors for regression problems. It is proved that this structured model can efficiently interpolate this tessellation and approximate the multiple hypotheses target distribution.
arXiv Detail & Related papers (2023-09-02T01:27:53Z)
Robust Learning with Progressive Data Expansion Against Spurious Correlation [65.83104529677234]
We study the learning process of a two-layer nonlinear convolutional neural network in the presence of spurious features. Our analysis suggests that imbalanced data groups and easily learnable spurious features can lead to the dominance of spurious features during the learning process. We propose a new training algorithm called PDE that efficiently enhances the model's robustness for a better worst-group performance.
arXiv Detail & Related papers (2023-06-08T05:44:06Z)
Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
Differentiable score-based causal discovery methods learn a directed acyclic graph from observational data. We propose a model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore.
arXiv Detail & Related papers (2023-03-06T14:49:59Z)
A Statistical Learning Take on the Concordance Index for Survival Analysis [0.29005223064604074]
We provide C-index Fisher-consistency results and excess risk bounds for several commonly used cost functions in survival analysis. We also study the general case where no model assumption is made and present a new, off-the-shelf method that is shown to be consistent with the C-index.
arXiv Detail & Related papers (2023-02-23T14:33:54Z)
Amortized Inference for Causal Structure Learning [72.84105256353801]
Learning causal structure poses a search problem that typically involves evaluating structures using a score or independence test. We train a variational inference model to predict the causal structure from observational/interventional data. Our models exhibit robust generalization capabilities under substantial distribution shift.
arXiv Detail & Related papers (2022-05-25T17:37:08Z)
The Concordance Index decomposition: A measure for a deeper understanding of survival prediction models [3.186455928607442]
The Concordance Index (C-index) is a commonly used metric in Survival Analysis for evaluating the performance of a prediction model. We propose a decomposition of the C-index into a weighted harmonic mean of two quantities: one for ranking observed events versus other observed events, and the other for ranking observed events versus censored cases.
arXiv Detail & Related papers (2022-02-28T23:50:47Z)
Asymptotic Behavior of Adversarial Training in Binary Classification [41.7567932118769]
Adversarial training is considered to be the state-of-the-art method for defense against adversarial attacks. Despite being successful in practice, several problems in understanding performance of adversarial training remain open. We derive precise theoretical predictions for the minimization of adversarial training in binary classification.
arXiv Detail & Related papers (2020-10-26T01:44:20Z)
CASTLE: Regularization via Auxiliary Causal Graph Discovery [89.74800176981842]
We introduce Causal Structure Learning (CASTLE) regularization and propose to regularize a neural network by jointly learning the causal relationships between variables. CASTLE efficiently reconstructs only the features in the causal DAG that have a causal neighbor, whereas reconstruction-based regularizers suboptimally reconstruct all input features.
arXiv Detail & Related papers (2020-09-28T09:49:38Z)
The Price of Tailoring the Index to Your Data: Poisoning Attacks on Learned Index Structures [9.567119607658299]
We present the first study of poisoning attacks on learned index structures. We formulate the first poisoning attacks on linear regression models trained on a cumulative distribution function. We generalize our poisoning techniques to attack a more advanced two-stage design of learned index structures.
arXiv Detail & Related papers (2020-08-01T17:12:04Z)
On the Benefits of Invariance in Neural Networks [56.362579457990094]
We show that training with data augmentation leads to better estimates of risk and thereof gradients, and we provide a PAC-Bayes generalization bound for models trained with data augmentation. We also show that compared to data augmentation, feature averaging reduces generalization error when used with convex losses, and tightens PAC-Bayes bounds.
arXiv Detail & Related papers (2020-05-01T02:08:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.