Related papers: Label Inference Attacks from Log-loss Scores

Label Inference Attacks from Log-loss Scores

URL: http://arxiv.org/abs/2105.08266v1
Date: Tue, 18 May 2021 04:17:06 GMT
Title: Label Inference Attacks from Log-loss Scores
Authors: Abhinav Aggarwal, Shiva Prasad Kasiviswanathan, Zekun Xu, Oluwaseyi Feyisetan, Nathanael Teissier
Abstract summary: In this paper, we investigate the problem of inferring the labels of a dataset from single (or multiple) log-loss score(s) without any other access to the dataset. Surprisingly, we show that for any finite number of label classes, it is possible to accurately infer the labels of the dataset from the reported log-loss score of a single carefully constructed prediction vector. We present label inference algorithms (attacks) that succeed even under addition of noise to the log-loss scores and under limited precision arithmetic.
Score: 11.780563744330038
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Log-loss (also known as cross-entropy loss) metric is ubiquitously used across machine learning applications to assess the performance of classification algorithms. In this paper, we investigate the problem of inferring the labels of a dataset from single (or multiple) log-loss score(s), without any other access to the dataset. Surprisingly, we show that for any finite number of label classes, it is possible to accurately infer the labels of the dataset from the reported log-loss score of a single carefully constructed prediction vector if we allow arbitrary precision arithmetic. Additionally, we present label inference algorithms (attacks) that succeed even under addition of noise to the log-loss scores and under limited precision arithmetic. All our algorithms rely on ideas from number theory and combinatorics and require no model training. We run experimental simulations on some real datasets to demonstrate the ease of running these attacks in practice.

Related papers

AllMatch: Exploiting All Unlabeled Data for Semi-Supervised Learning [5.0823084858349485]
We present a novel SSL algorithm named AllMatch, which achieves improved pseudo-label accuracy and a 100% utilization ratio for the unlabeled data. The results demonstrate that AllMatch consistently outperforms existing state-of-the-art methods.
arXiv Detail & Related papers (2024-06-22T06:59:52Z)
Pseudo-labelling meets Label Smoothing for Noisy Partial Label Learning [8.387189407144403]
We motivate weakly supervised learning as an effective learning paradigm for problems where curating perfectly annotated datasets is expensive.<n>We focus on Partial Learning (PLL), a weakly-supervised learning paradigm where each training instance is paired with a set of candidate labels.<n>We present a framework that initially assigns pseudo-labels to images by exploiting the noisy partial labels through a weighted nearest neighbour algorithm.
arXiv Detail & Related papers (2024-02-07T13:32:47Z)
Partial-Label Regression [54.74984751371617]
Partial-label learning is a weakly supervised learning setting that allows each training example to be annotated with a set of candidate labels. Previous studies on partial-label learning only focused on the classification setting where candidate labels are all discrete. In this paper, we provide the first attempt to investigate partial-label regression, where each training example is annotated with a set of real-valued candidate labels.
arXiv Detail & Related papers (2023-06-15T09:02:24Z)
All Points Matter: Entropy-Regularized Distribution Alignment for Weakly-supervised 3D Segmentation [67.30502812804271]
Pseudo-labels are widely employed in weakly supervised 3D segmentation tasks where only sparse ground-truth labels are available for learning. We propose a novel learning strategy to regularize the generated pseudo-labels and effectively narrow the gaps between pseudo-labels and model predictions.
arXiv Detail & Related papers (2023-05-25T08:19:31Z)
Scalable Penalized Regression for Noise Detection in Learning with Noisy Labels [44.79124350922491]
We propose using a theoretically guaranteed noisy label detection framework to detect and remove noisy data for Learning with Noisy Labels (LNL) Specifically, we design a penalized regression to model the linear relation between network features and one-hot labels. To make the framework scalable to datasets that contain a large number of categories and training data, we propose a split algorithm to divide the whole training set into small pieces.
arXiv Detail & Related papers (2022-03-15T11:09:58Z)
Detecting Handwritten Mathematical Terms with Sensor Based Data [71.84852429039881]
We propose a solution to the UbiComp 2021 Challenge by Stabilo in which handwritten mathematical terms are supposed to be automatically classified. The input data set contains data of different writers, with label strings constructed from a total of 15 different possible characters.
arXiv Detail & Related papers (2021-09-12T19:33:34Z)
Dash: Semi-Supervised Learning with Dynamic Thresholding [72.74339790209531]
We propose a semi-supervised learning (SSL) approach that uses unlabeled examples to train models. Our proposed approach, Dash, enjoys its adaptivity in terms of unlabeled data selection.
arXiv Detail & Related papers (2021-09-01T23:52:29Z)
Computationally Efficient Wasserstein Loss for Structured Labels [37.33134854462556]
We propose a tree-Wasserstein distance regularized LDL algorithm, focusing on hierarchical text classification tasks. We show that the proposed method successfully considers the structure of labels during training, and it compares favorably with the Sinkhorn algorithm in terms of computation time and memory usage.
arXiv Detail & Related papers (2021-03-01T10:45:13Z)
Analysis of label noise in graph-based semi-supervised learning [2.4366811507669124]
In machine learning, one must acquire labels to help supervise a model that will be able to generalize to unseen data. It is often the case that most of our data is unlabeled. Semi-supervised learning (SSL) alleviates that by making strong assumptions about the relation between the labels and the input data distribution.
arXiv Detail & Related papers (2020-09-27T22:13:20Z)
Label Noise Types and Their Effects on Deep Learning [0.0]
In this work, we provide a detailed analysis of the effects of different kinds of label noise on learning. We propose a generic framework to generate feature-dependent label noise, which we show to be the most challenging case for learning. For the ease of other researchers to test their algorithms with noisy labels, we share corrupted labels for the most commonly used benchmark datasets.
arXiv Detail & Related papers (2020-03-23T18:03:39Z)
Progressive Identification of True Labels for Partial-Label Learning [112.94467491335611]
Partial-label learning (PLL) is a typical weakly supervised learning problem, where each training instance is equipped with a set of candidate labels among which only one is the true label. Most existing methods elaborately designed as constrained optimizations that must be solved in specific manners, making their computational complexity a bottleneck for scaling up to big data. This paper proposes a novel framework of classifier with flexibility on the model and optimization algorithm.
arXiv Detail & Related papers (2020-02-19T08:35:15Z)
Learning with Out-of-Distribution Data for Audio Classification [60.48251022280506]
We show that detecting and relabelling certain OOD instances, rather than discarding them, can have a positive effect on learning. The proposed method is shown to improve the performance of convolutional neural networks by a significant margin.
arXiv Detail & Related papers (2020-02-11T21:08:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.