Private Multi-Winner Voting for Machine Learning
- URL: http://arxiv.org/abs/2211.15410v1
- Date: Wed, 23 Nov 2022 20:06:46 GMT
- Title: Private Multi-Winner Voting for Machine Learning
- Authors: Adam Dziedzic, Christopher A Choquette-Choo, Natalie Dullerud, Vinith
Menon Suriyakumar, Ali Shahin Shamsabadi, Muhammad Ahmad Kaleem, Somesh Jha,
Nicolas Papernot, Xiao Wang
- Abstract summary: We propose three new DP multi-winner mechanisms: Binary, $tau$, and Powerset voting.
Binary voting operates independently per label through composition.
$tau$ voting bounds votes optimally in their $ell$ norm for tight data-independent guarantees.
Powerset voting operates over the entire binary vector by viewing the possible outcomes as a power set.
- Score: 48.0093793427039
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Private multi-winner voting is the task of revealing $k$-hot binary vectors
satisfying a bounded differential privacy (DP) guarantee. This task has been
understudied in machine learning literature despite its prevalence in many
domains such as healthcare. We propose three new DP multi-winner mechanisms:
Binary, $\tau$, and Powerset voting. Binary voting operates independently per
label through composition. $\tau$ voting bounds votes optimally in their
$\ell_2$ norm for tight data-independent guarantees. Powerset voting operates
over the entire binary vector by viewing the possible outcomes as a power set.
Our theoretical and empirical analysis shows that Binary voting can be a
competitive mechanism on many tasks unless there are strong correlations
between labels, in which case Powerset voting outperforms it. We use our
mechanisms to enable privacy-preserving multi-label learning in the central
setting by extending the canonical single-label technique: PATE. We find that
our techniques outperform current state-of-the-art approaches on large,
real-world healthcare data and standard multi-label benchmarks. We further
enable multi-label confidential and private collaborative (CaPC) learning and
show that model performance can be significantly improved in the multi-site
setting.
Related papers
- UniDEC : Unified Dual Encoder and Classifier Training for Extreme Multi-Label Classification [42.36546066941635]
Extreme Multi-label Classification (XMC) involves predicting a subset of relevant labels from an extremely large label space.
This work proposes UniDEC, a novel end-to-end trainable framework which trains the dual encoder and classifier in together.
arXiv Detail & Related papers (2024-05-04T17:27:51Z) - Data as voters: instance selection using approval-based multi-winner voting [1.597617022056624]
We present a novel approach to the instance selection problem in machine learning (or data mining)
In our model, instances play a double role as voters and candidates.
For SVMs, we have obtained slight increases in the average accuracy by using several voting rules that satisfy EJR or PJR.
arXiv Detail & Related papers (2023-04-19T22:00:23Z) - MultiGuard: Provably Robust Multi-label Classification against
Adversarial Examples [67.0982378001551]
MultiGuard is the first provably robust defense against adversarial examples to multi-label classification.
Our major theoretical contribution is that we show a certain number of ground truth labels of an input are provably in the set of labels predicted by our MultiGuard.
arXiv Detail & Related papers (2022-10-03T17:50:57Z) - Optimizing Bi-Encoder for Named Entity Recognition via Contrastive
Learning [80.36076044023581]
We present an efficient bi-encoder framework for named entity recognition (NER)
We frame NER as a metric learning problem that maximizes the similarity between the vector representations of an entity mention and its type.
A major challenge to this bi-encoder formulation for NER lies in separating non-entity spans from entity mentions.
arXiv Detail & Related papers (2022-08-30T23:19:04Z) - A Reduction to Binary Approach for Debiasing Multiclass Datasets [12.885756277367443]
We prove that R2B satisfies optimality and bias guarantees and demonstrate empirically that it can lead to an improvement over two baselines.
We validate these conclusions on synthetic and real-world datasets from social science, computer vision, and healthcare.
arXiv Detail & Related papers (2022-05-31T15:11:41Z) - Trustable Co-label Learning from Multiple Noisy Annotators [68.59187658490804]
Supervised deep learning depends on massive accurately annotated examples.
A typical alternative is learning from multiple noisy annotators.
This paper proposes a data-efficient approach, called emphTrustable Co-label Learning (TCL)
arXiv Detail & Related papers (2022-03-08T16:57:00Z) - Selective-Supervised Contrastive Learning with Noisy Labels [73.81900964991092]
We propose selective-supervised contrastive learning (Sel-CL) to learn robust representations and handle noisy labels.
Specifically, Sel-CL extend supervised contrastive learning (Sup-CL), which is powerful in representation learning, but is degraded when there are noisy labels.
Sel-CL tackles the direct cause of the problem of Sup-CL: noisy pairs built by noisy labels mislead representation learning.
arXiv Detail & Related papers (2022-03-08T16:12:08Z) - Learning to Elect [7.893831644671976]
Voting systems have a wide range of applications including recommender systems, web search, product design and elections.
We show that set-input neural network architectures such as Set Transformers, fully-connected graph networks and DeepSets are both theoretically and empirically well-suited for learning voting rules.
arXiv Detail & Related papers (2021-08-05T17:55:46Z) - Classify and Generate Reciprocally: Simultaneous Positive-Unlabelled
Learning and Conditional Generation with Extra Data [77.31213472792088]
The scarcity of class-labeled data is a ubiquitous bottleneck in many machine learning problems.
We address this problem by leveraging Positive-Unlabeled(PU) classification and the conditional generation with extra unlabeled data.
We present a novel training framework to jointly target both PU classification and conditional generation when exposed to extra data.
arXiv Detail & Related papers (2020-06-14T08:27:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.