The Price of Tailoring the Index to Your Data: Poisoning Attacks on
Learned Index Structures
- URL: http://arxiv.org/abs/2008.00297v2
- Date: Sun, 27 Feb 2022 16:47:02 GMT
- Title: The Price of Tailoring the Index to Your Data: Poisoning Attacks on
Learned Index Structures
- Authors: Evgenios M. Kornaropoulos, Silei Ren, Roberto Tamassia
- Abstract summary: We present the first study of poisoning attacks on learned index structures.
We formulate the first poisoning attacks on linear regression models trained on a cumulative distribution function.
We generalize our poisoning techniques to attack a more advanced two-stage design of learned index structures.
- Score: 9.567119607658299
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The concept of learned index structures relies on the idea that the
input-output functionality of a database index can be viewed as a prediction
task and, thus, be implemented using a machine learning model instead of
traditional algorithmic techniques. This novel angle for a decades-old problem
has inspired numerous exciting results in the intersection of machine learning
and data structures. However, the main advantage of learned index structures,
i.e., the ability to adjust to the data at hand via the underlying ML-model,
can become a disadvantage from a security perspective as it could be exploited.
In this work, we present the first study of poisoning attacks on learned
index structures. The required poisoning approach is different from all
previous works since the model under attack is trained on a cumulative
distribution function (CDF) and, thus, every injection on the training set has
a cascading impact on multiple data values. We formulate the first poisoning
attacks on linear regression models trained on the CDF, which is a basic
building block of the proposed learned index structures. We generalize our
poisoning techniques to attack a more advanced two-stage design of learned
index structures called recursive model index (RMI), which has been shown to
outperform traditional B-Trees. We evaluate our attacks on real-world and
synthetic datasets under a wide variety of parameterizations of the model and
show that the error of the RMI increases up to $300\times$ and the error of its
second-stage models increases up to $3000\times$.
Related papers
- Distributionally robust self-supervised learning for tabular data [2.942619386779508]
Learning robust representation in presence of error slices is challenging, due to high cardinality features and the complexity of constructing error sets.
Traditional robust representation learning methods are largely focused on improving worst group performance in supervised setting in computer vision.
Our approach utilizes an encoder-decoder model trained with Masked Language Modeling (MLM) loss to learn robust latent representations.
arXiv Detail & Related papers (2024-10-11T04:23:56Z) - UpLIF: An Updatable Self-Tuning Learned Index Framework [4.077820670802213]
UpLIF is an adaptive self-tuning learned index that adjusts the model to accommodate incoming updates.
We also introduce the concept of balanced model adjustment, which determines the model's inherent properties.
arXiv Detail & Related papers (2024-08-07T22:30:43Z) - Provable Robustness of (Graph) Neural Networks Against Data Poisoning and Backdoor Attacks [50.87615167799367]
We certify Graph Neural Networks (GNNs) against poisoning attacks, including backdoors, targeting the node features of a given graph.
Our framework provides fundamental insights into the role of graph structure and its connectivity on the worst-case behavior of convolution-based and PageRank-based GNNs.
arXiv Detail & Related papers (2024-07-15T16:12:51Z) - TWINS: A Fine-Tuning Framework for Improved Transferability of
Adversarial Robustness and Generalization [89.54947228958494]
This paper focuses on the fine-tuning of an adversarially pre-trained model in various classification tasks.
We propose a novel statistics-based approach, Two-WIng NormliSation (TWINS) fine-tuning framework.
TWINS is shown to be effective on a wide range of image classification datasets in terms of both generalization and robustness.
arXiv Detail & Related papers (2023-03-20T14:12:55Z) - Principled and Efficient Motif Finding for Structure Learning of Lifted
Graphical Models [5.317624228510748]
Structure learning is a core problem in AI central to the fields of neuro-symbolic AI and statistical relational learning.
We present the first principled approach for mining structural motifs in lifted graphical models.
We show that we outperform state-of-the-art structure learning approaches by up to 6% in terms of accuracy and up to 80% in terms of runtime.
arXiv Detail & Related papers (2023-02-09T12:21:55Z) - Robust Graph Representation Learning via Predictive Coding [46.22695915912123]
Predictive coding is a message-passing framework initially developed to model information processing in the brain.
In this work, we build models that rely on the message-passing rule of predictive coding.
We show that the proposed models are comparable to standard ones in terms of performance in both inductive and transductive tasks.
arXiv Detail & Related papers (2022-12-09T03:58:22Z) - Testing the Robustness of Learned Index Structures [15.472214703318805]
This work evaluates the robustness of learned index structures in the presence of adversarial workloads.
To simulate adversarial workloads, we carry out a data poisoning attack on linear regression models.
We show that learned index structures can suffer from a significant performance deterioration of up to 20% when evaluated on poisoned vs. non-poisoned datasets.
arXiv Detail & Related papers (2022-07-23T18:44:54Z) - Self-Supervised Class Incremental Learning [51.62542103481908]
Existing Class Incremental Learning (CIL) methods are based on a supervised classification framework sensitive to data labels.
When updating them based on the new class data, they suffer from catastrophic forgetting: the model cannot discern old class data clearly from the new.
In this paper, we explore the performance of Self-Supervised representation learning in Class Incremental Learning (SSCIL) for the first time.
arXiv Detail & Related papers (2021-11-18T06:58:19Z) - The Causal Neural Connection: Expressiveness, Learnability, and
Inference [125.57815987218756]
An object called structural causal model (SCM) represents a collection of mechanisms and sources of random variation of the system under investigation.
In this paper, we show that the causal hierarchy theorem (Thm. 1, Bareinboim et al., 2020) still holds for neural models.
We introduce a special type of SCM called a neural causal model (NCM), and formalize a new type of inductive bias to encode structural constraints necessary for performing causal inferences.
arXiv Detail & Related papers (2021-07-02T01:55:18Z) - On the Transferability of Adversarial Attacksagainst Neural Text
Classifier [121.6758865857686]
We investigate the transferability of adversarial examples for text classification models.
We propose a genetic algorithm to find an ensemble of models that can induce adversarial examples to fool almost all existing models.
We derive word replacement rules that can be used for model diagnostics from these adversarial examples.
arXiv Detail & Related papers (2020-11-17T10:45:05Z) - RadixSpline: A Single-Pass Learned Index [84.84747738666263]
We introduce RadixSpline (RS), a learned index that can be built in a single pass over the data.
RS achieves competitive results on all datasets, despite the fact that it only has two parameters.
arXiv Detail & Related papers (2020-04-30T01:56:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.