Label Inference Attacks from Log-loss Scores
- URL: http://arxiv.org/abs/2105.08266v1
- Date: Tue, 18 May 2021 04:17:06 GMT
- Title: Label Inference Attacks from Log-loss Scores
- Authors: Abhinav Aggarwal, Shiva Prasad Kasiviswanathan, Zekun Xu, Oluwaseyi
Feyisetan, Nathanael Teissier
- Abstract summary: In this paper, we investigate the problem of inferring the labels of a dataset from single (or multiple) log-loss score(s) without any other access to the dataset.
Surprisingly, we show that for any finite number of label classes, it is possible to accurately infer the labels of the dataset from the reported log-loss score of a single carefully constructed prediction vector.
We present label inference algorithms (attacks) that succeed even under addition of noise to the log-loss scores and under limited precision arithmetic.
- Score: 11.780563744330038
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Log-loss (also known as cross-entropy loss) metric is ubiquitously used
across machine learning applications to assess the performance of
classification algorithms. In this paper, we investigate the problem of
inferring the labels of a dataset from single (or multiple) log-loss score(s),
without any other access to the dataset. Surprisingly, we show that for any
finite number of label classes, it is possible to accurately infer the labels
of the dataset from the reported log-loss score of a single carefully
constructed prediction vector if we allow arbitrary precision arithmetic.
Additionally, we present label inference algorithms (attacks) that succeed even
under addition of noise to the log-loss scores and under limited precision
arithmetic. All our algorithms rely on ideas from number theory and
combinatorics and require no model training. We run experimental simulations on
some real datasets to demonstrate the ease of running these attacks in
practice.
Related papers
- AllMatch: Exploiting All Unlabeled Data for Semi-Supervised Learning [5.0823084858349485]
We present a novel SSL algorithm named AllMatch, which achieves improved pseudo-label accuracy and a 100% utilization ratio for the unlabeled data.
The results demonstrate that AllMatch consistently outperforms existing state-of-the-art methods.
arXiv Detail & Related papers (2024-06-22T06:59:52Z) - Partial-Label Regression [54.74984751371617]
Partial-label learning is a weakly supervised learning setting that allows each training example to be annotated with a set of candidate labels.
Previous studies on partial-label learning only focused on the classification setting where candidate labels are all discrete.
In this paper, we provide the first attempt to investigate partial-label regression, where each training example is annotated with a set of real-valued candidate labels.
arXiv Detail & Related papers (2023-06-15T09:02:24Z) - All Points Matter: Entropy-Regularized Distribution Alignment for
Weakly-supervised 3D Segmentation [67.30502812804271]
Pseudo-labels are widely employed in weakly supervised 3D segmentation tasks where only sparse ground-truth labels are available for learning.
We propose a novel learning strategy to regularize the generated pseudo-labels and effectively narrow the gaps between pseudo-labels and model predictions.
arXiv Detail & Related papers (2023-05-25T08:19:31Z) - Scalable Penalized Regression for Noise Detection in Learning with Noisy
Labels [44.79124350922491]
We propose using a theoretically guaranteed noisy label detection framework to detect and remove noisy data for Learning with Noisy Labels (LNL)
Specifically, we design a penalized regression to model the linear relation between network features and one-hot labels.
To make the framework scalable to datasets that contain a large number of categories and training data, we propose a split algorithm to divide the whole training set into small pieces.
arXiv Detail & Related papers (2022-03-15T11:09:58Z) - Detecting Handwritten Mathematical Terms with Sensor Based Data [71.84852429039881]
We propose a solution to the UbiComp 2021 Challenge by Stabilo in which handwritten mathematical terms are supposed to be automatically classified.
The input data set contains data of different writers, with label strings constructed from a total of 15 different possible characters.
arXiv Detail & Related papers (2021-09-12T19:33:34Z) - Dash: Semi-Supervised Learning with Dynamic Thresholding [72.74339790209531]
We propose a semi-supervised learning (SSL) approach that uses unlabeled examples to train models.
Our proposed approach, Dash, enjoys its adaptivity in terms of unlabeled data selection.
arXiv Detail & Related papers (2021-09-01T23:52:29Z) - Computationally Efficient Wasserstein Loss for Structured Labels [37.33134854462556]
We propose a tree-Wasserstein distance regularized LDL algorithm, focusing on hierarchical text classification tasks.
We show that the proposed method successfully considers the structure of labels during training, and it compares favorably with the Sinkhorn algorithm in terms of computation time and memory usage.
arXiv Detail & Related papers (2021-03-01T10:45:13Z) - Analysis of label noise in graph-based semi-supervised learning [2.4366811507669124]
In machine learning, one must acquire labels to help supervise a model that will be able to generalize to unseen data.
It is often the case that most of our data is unlabeled.
Semi-supervised learning (SSL) alleviates that by making strong assumptions about the relation between the labels and the input data distribution.
arXiv Detail & Related papers (2020-09-27T22:13:20Z) - Label Noise Types and Their Effects on Deep Learning [0.0]
In this work, we provide a detailed analysis of the effects of different kinds of label noise on learning.
We propose a generic framework to generate feature-dependent label noise, which we show to be the most challenging case for learning.
For the ease of other researchers to test their algorithms with noisy labels, we share corrupted labels for the most commonly used benchmark datasets.
arXiv Detail & Related papers (2020-03-23T18:03:39Z) - Progressive Identification of True Labels for Partial-Label Learning [112.94467491335611]
Partial-label learning (PLL) is a typical weakly supervised learning problem, where each training instance is equipped with a set of candidate labels among which only one is the true label.
Most existing methods elaborately designed as constrained optimizations that must be solved in specific manners, making their computational complexity a bottleneck for scaling up to big data.
This paper proposes a novel framework of classifier with flexibility on the model and optimization algorithm.
arXiv Detail & Related papers (2020-02-19T08:35:15Z) - Learning with Out-of-Distribution Data for Audio Classification [60.48251022280506]
We show that detecting and relabelling certain OOD instances, rather than discarding them, can have a positive effect on learning.
The proposed method is shown to improve the performance of convolutional neural networks by a significant margin.
arXiv Detail & Related papers (2020-02-11T21:08:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.