Related papers: Protecting Publicly Available Data With Machine Learning Shortcuts

Protecting Publicly Available Data With Machine Learning Shortcuts

URL: http://arxiv.org/abs/2310.19381v1
Date: Mon, 30 Oct 2023 09:38:03 GMT
Title: Protecting Publicly Available Data With Machine Learning Shortcuts
Authors: Nicolas M. M\"uller, Maximilian Burgert, Pascal Debus, Jennifer Williams, Philip Sperl, Konstantin B\"ottinger
Abstract summary: We show that even simple shortcuts are difficult to detect by explainable AI methods. We then exploit this fact and design an approach to defend online databases against crawlers. We show that a deterrent can be created by deliberately adding ML shortcuts.
Score: 3.8709855706783105
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Machine-learning (ML) shortcuts or spurious correlations are artifacts in datasets that lead to very good training and test performance but severely limit the model's generalization capability. Such shortcuts are insidious because they go unnoticed due to good in-domain test performance. In this paper, we explore the influence of different shortcuts and show that even simple shortcuts are difficult to detect by explainable AI methods. We then exploit this fact and design an approach to defend online databases against crawlers: providers such as dating platforms, clothing manufacturers, or used car dealers have to deal with a professionalized crawling industry that grabs and resells data points on a large scale. We show that a deterrent can be created by deliberately adding ML shortcuts. Such augmented datasets are then unusable for ML use cases, which deters crawlers and the unauthorized use of data from the internet. Using real-world data from three use cases, we show that the proposed approach renders such collected data unusable, while the shortcut is at the same time difficult to notice in human perception. Thus, our proposed approach can serve as a proactive protection against illegitimate data crawling.

Related papers

Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration [54.8229698058649]
We study how unlabeled prior trajectory data can be leveraged to learn efficient exploration strategies. Our method SUPE (Skills from Unlabeled Prior data for Exploration) demonstrates that a careful combination of these ideas compounds their benefits. We empirically show that SUPE reliably outperforms prior strategies, successfully solving a suite of long-horizon, sparse-reward tasks.
arXiv Detail & Related papers (2024-10-23T17:58:45Z)
Are you still on track!? Catching LLM Task Drift with Activations [55.75645403965326]
Task drift allows attackers to exfiltrate data or influence the LLM's output for other users. We show that a simple linear classifier can detect drift with near-perfect ROC AUC on an out-of-distribution test set. We observe that this approach generalizes surprisingly well to unseen task domains, such as prompt injections, jailbreaks, and malicious instructions.
arXiv Detail & Related papers (2024-06-02T16:53:21Z)
Can LLMs Separate Instructions From Data? And What Do We Even Mean By That? [60.50127555651554]
Large Language Models (LLMs) show impressive results in numerous practical applications, but they lack essential safety features. This makes them vulnerable to manipulations such as indirect prompt injections and generally unsuitable for safety-critical tasks. We introduce a formal measure for instruction-data separation and an empirical variant that is calculable from a model's outputs.
arXiv Detail & Related papers (2024-03-11T15:48:56Z)
What Can We Learn from Unlearnable Datasets? [107.12337511216228]
Unlearnable datasets have the potential to protect data privacy by preventing deep neural networks from generalizing. It is widely believed that neural networks trained on unlearnable datasets only learn shortcuts, simpler rules that are not useful for generalization. In contrast, we find that networks actually can learn useful features that can be reweighed for high test performance, suggesting that image protection is not assured.
arXiv Detail & Related papers (2023-05-30T17:41:35Z)
Towards Generalizable Data Protection With Transferable Unlearnable Examples [50.628011208660645]
We present a novel, generalizable data protection method by generating transferable unlearnable examples. To the best of our knowledge, this is the first solution that examines data privacy from the perspective of data distribution.
arXiv Detail & Related papers (2023-05-18T04:17:01Z)
Shortcut Detection with Variational Autoencoders [1.3174512123890016]
We present a novel approach to detect shortcuts in image and audio datasets by leveraging variational autoencoders (VAEs) The disentanglement of features in the latent space of VAEs allows us to discover feature-target correlations in datasets and semi-automatically evaluate them for ML shortcuts. We demonstrate the applicability of our method on several real-world datasets and identify shortcuts that have not been discovered before.
arXiv Detail & Related papers (2023-02-08T18:26:10Z)
Localized Shortcut Removal [4.511561231517167]
High performance on held-out test data does not necessarily indicate that a model generalizes or learns anything meaningful. This is often due to the existence of machine learning shortcuts - features in the data that are predictive but unrelated to the problem at hand. We use an adversarially trained lens to detect and eliminate highly predictive but semantically unconnected clues in images.
arXiv Detail & Related papers (2022-11-24T13:05:33Z)
Canary in a Coalmine: Better Membership Inference with Ensembled Adversarial Queries [53.222218035435006]
We use adversarial tools to optimize for queries that are discriminative and diverse. Our improvements achieve significantly more accurate membership inference than existing methods.
arXiv Detail & Related papers (2022-10-19T17:46:50Z)
Black-box Dataset Ownership Verification via Backdoor Watermarking [67.69308278379957]
We formulate the protection of released datasets as verifying whether they are adopted for training a (suspicious) third-party model. We propose to embed external patterns via backdoor watermarking for the ownership verification to protect them. Specifically, we exploit poison-only backdoor attacks ($e.g.$, BadNets) for dataset watermarking and design a hypothesis-test-guided method for dataset verification.
arXiv Detail & Related papers (2022-08-04T05:32:20Z)
Monitoring Shortcut Learning using Mutual Information [16.17600110257266]
Shortcut learning is evaluated on real-world data that does not contain spurious correlations. Experiments demonstrate that MI can be used as a metric network shortcut network.
arXiv Detail & Related papers (2022-06-27T03:55:23Z)
Combining Feature and Instance Attribution to Detect Artifacts [62.63504976810927]
We propose methods to facilitate identification of training data artifacts. We show that this proposed training-feature attribution approach can be used to uncover artifacts in training data. We execute a small user study to evaluate whether these methods are useful to NLP researchers in practice.
arXiv Detail & Related papers (2021-07-01T09:26:13Z)
Can We Achieve More with Less? Exploring Data Augmentation for Toxic Comment Classification [0.0]
This paper tackles one of the greatest limitations in Machine Learning: Data Scarcity. We explore whether high accuracy classifiers can be built from small datasets, utilizing a combination of data augmentation techniques and machine learning algorithms.
arXiv Detail & Related papers (2020-07-02T04:43:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.