Related papers: Private Multi-Winner Voting for Machine Learning

Private Multi-Winner Voting for Machine Learning

URL: http://arxiv.org/abs/2211.15410v1
Date: Wed, 23 Nov 2022 20:06:46 GMT
Title: Private Multi-Winner Voting for Machine Learning
Authors: Adam Dziedzic, Christopher A Choquette-Choo, Natalie Dullerud, Vinith Menon Suriyakumar, Ali Shahin Shamsabadi, Muhammad Ahmad Kaleem, Somesh Jha, Nicolas Papernot, Xiao Wang
Abstract summary: We propose three new DP multi-winner mechanisms: Binary, $tau$, and Powerset voting. Binary voting operates independently per label through composition. $tau$ voting bounds votes optimally in their $ell$ norm for tight data-independent guarantees. Powerset voting operates over the entire binary vector by viewing the possible outcomes as a power set.
Score: 48.0093793427039
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Private multi-winner voting is the task of revealing $k$-hot binary vectors satisfying a bounded differential privacy (DP) guarantee. This task has been understudied in machine learning literature despite its prevalence in many domains such as healthcare. We propose three new DP multi-winner mechanisms: Binary, $\tau$, and Powerset voting. Binary voting operates independently per label through composition. $\tau$ voting bounds votes optimally in their $\ell_2$ norm for tight data-independent guarantees. Powerset voting operates over the entire binary vector by viewing the possible outcomes as a power set. Our theoretical and empirical analysis shows that Binary voting can be a competitive mechanism on many tasks unless there are strong correlations between labels, in which case Powerset voting outperforms it. We use our mechanisms to enable privacy-preserving multi-label learning in the central setting by extending the canonical single-label technique: PATE. We find that our techniques outperform current state-of-the-art approaches on large, real-world healthcare data and standard multi-label benchmarks. We further enable multi-label confidential and private collaborative (CaPC) learning and show that model performance can be significantly improved in the multi-site setting.

Related papers

UniDEC : Unified Dual Encoder and Classifier Training for Extreme Multi-Label Classification [42.36546066941635]
Extreme Multi-label Classification (XMC) involves predicting a subset of relevant labels from an extremely large label space. This work proposes UniDEC, a novel end-to-end trainable framework which trains the dual encoder and classifier in together.
arXiv Detail & Related papers (2024-05-04T17:27:51Z)
Don't Label Twice: Quantity Beats Quality when Comparing Binary Classifiers on a Budget [16.81162898745253]
It's common practice to aggregate multiple noisy labels for a given data point into a less noisy label via a majority vote. We prove a theorem that runs counter to conventional wisdom. We discuss the implications of our work for the design of machine learning benchmarks.
arXiv Detail & Related papers (2024-02-03T19:40:41Z)
Data as voters: instance selection using approval-based multi-winner voting [1.597617022056624]
We present a novel approach to the instance selection problem in machine learning (or data mining) In our model, instances play a double role as voters and candidates. For SVMs, we have obtained slight increases in the average accuracy by using several voting rules that satisfy EJR or PJR.
arXiv Detail & Related papers (2023-04-19T22:00:23Z)
MultiGuard: Provably Robust Multi-label Classification against Adversarial Examples [67.0982378001551]
MultiGuard is the first provably robust defense against adversarial examples to multi-label classification. Our major theoretical contribution is that we show a certain number of ground truth labels of an input are provably in the set of labels predicted by our MultiGuard.
arXiv Detail & Related papers (2022-10-03T17:50:57Z)
Optimizing Bi-Encoder for Named Entity Recognition via Contrastive Learning [80.36076044023581]
We present an efficient bi-encoder framework for named entity recognition (NER) We frame NER as a metric learning problem that maximizes the similarity between the vector representations of an entity mention and its type. A major challenge to this bi-encoder formulation for NER lies in separating non-entity spans from entity mentions.
arXiv Detail & Related papers (2022-08-30T23:19:04Z)
A Reduction to Binary Approach for Debiasing Multiclass Datasets [12.885756277367443]
We prove that R2B satisfies optimality and bias guarantees and demonstrate empirically that it can lead to an improvement over two baselines. We validate these conclusions on synthetic and real-world datasets from social science, computer vision, and healthcare.
arXiv Detail & Related papers (2022-05-31T15:11:41Z)
Trustable Co-label Learning from Multiple Noisy Annotators [68.59187658490804]
Supervised deep learning depends on massive accurately annotated examples. A typical alternative is learning from multiple noisy annotators. This paper proposes a data-efficient approach, called emphTrustable Co-label Learning (TCL)
arXiv Detail & Related papers (2022-03-08T16:57:00Z)
Selective-Supervised Contrastive Learning with Noisy Labels [73.81900964991092]
We propose selective-supervised contrastive learning (Sel-CL) to learn robust representations and handle noisy labels. Specifically, Sel-CL extend supervised contrastive learning (Sup-CL), which is powerful in representation learning, but is degraded when there are noisy labels. Sel-CL tackles the direct cause of the problem of Sup-CL: noisy pairs built by noisy labels mislead representation learning.
arXiv Detail & Related papers (2022-03-08T16:12:08Z)
Learning to Elect [7.893831644671976]
Voting systems have a wide range of applications including recommender systems, web search, product design and elections. We show that set-input neural network architectures such as Set Transformers, fully-connected graph networks and DeepSets are both theoretically and empirically well-suited for learning voting rules.
arXiv Detail & Related papers (2021-08-05T17:55:46Z)
Classify and Generate Reciprocally: Simultaneous Positive-Unlabelled Learning and Conditional Generation with Extra Data [77.31213472792088]
The scarcity of class-labeled data is a ubiquitous bottleneck in many machine learning problems. We address this problem by leveraging Positive-Unlabeled(PU) classification and the conditional generation with extra unlabeled data. We present a novel training framework to jointly target both PU classification and conditional generation when exposed to extra data.
arXiv Detail & Related papers (2020-06-14T08:27:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.