Multi-class Item Mining under Local Differential Privacy
- URL: http://arxiv.org/abs/2504.13526v1
- Date: Fri, 18 Apr 2025 07:37:06 GMT
- Title: Multi-class Item Mining under Local Differential Privacy
- Authors: Yulian Mao, Qingqing Ye, Rong Du, Qi Wang, Kai Huang, Haibo Hu,
- Abstract summary: We propose frameworks for multi-class item mining, along with two mechanisms: validity to reduce the impact of invalid data, and correlated perturbation to preserve the relationship between labels and items.<n>We also apply these optimized methods to two multi-class item mining queries: frequency estimation and top-$k$ item mining.
- Score: 16.12696021148232
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Item mining, a fundamental task for collecting statistical data from users, has raised increasing privacy concerns. To address these concerns, local differential privacy (LDP) was proposed as a privacy-preserving technique. Existing LDP item mining mechanisms primarily concentrate on global statistics, i.e., those from the entire dataset. Nevertheless, they fall short of user-tailored tasks such as personalized recommendations, whereas classwise statistics can improve task accuracy with fine-grained information. Meanwhile, the introduction of class labels brings new challenges. Label perturbation may result in invalid items for aggregation. To this end, we propose frameworks for multi-class item mining, along with two mechanisms: validity perturbation to reduce the impact of invalid data, and correlated perturbation to preserve the relationship between labels and items. We also apply these optimized methods to two multi-class item mining queries: frequency estimation and top-$k$ item mining. Through theoretical analysis and extensive experiments, we verify the effectiveness and superiority of these methods.
Related papers
- TAROT: Targeted Data Selection via Optimal Transport [64.56083922130269]
TAROT is a targeted data selection framework grounded in optimal transport theory.
Previous targeted data selection methods rely on influence-based greedys to enhance domain-specific performance.
We evaluate TAROT across multiple tasks, including semantic segmentation, motion prediction, and instruction tuning.
arXiv Detail & Related papers (2024-11-30T10:19:51Z) - Collaborative Feature-Logits Contrastive Learning for Open-Set Semi-Supervised Object Detection [75.02249869573994]
In open-set scenarios, the unlabeled dataset contains both in-distribution (ID) classes and out-of-distribution (OOD) classes.<n>Applying semi-supervised detectors in such settings can lead to misclassifying OOD class as ID classes.<n>We propose a simple yet effective method, termed Collaborative Feature-Logits Detector (CFL-Detector)
arXiv Detail & Related papers (2024-11-20T02:57:35Z) - Improving a Named Entity Recognizer Trained on Noisy Data with a Few
Clean Instances [55.37242480995541]
We propose to denoise noisy NER data with guidance from a small set of clean instances.
Along with the main NER model we train a discriminator model and use its outputs to recalibrate the sample weights.
Results on public crowdsourcing and distant supervision datasets show that the proposed method can consistently improve performance with a small guidance set.
arXiv Detail & Related papers (2023-10-25T17:23:37Z) - Adaptive Negative Evidential Deep Learning for Open-set Semi-supervised Learning [69.81438976273866]
Open-set semi-supervised learning (Open-set SSL) considers a more practical scenario, where unlabeled data and test data contain new categories (outliers) not observed in labeled data (inliers)
We introduce evidential deep learning (EDL) as an outlier detector to quantify different types of uncertainty, and design different uncertainty metrics for self-training and inference.
We propose a novel adaptive negative optimization strategy, making EDL more tailored to the unlabeled dataset containing both inliers and outliers.
arXiv Detail & Related papers (2023-03-21T09:07:15Z) - MaSS: Multi-attribute Selective Suppression [8.337285030303285]
We propose Multi-attribute Selective Suppression, or MaSS, a framework for performing precisely targeted data surgery.
MaSS learns a data modifier through adversarial games between two sets of networks, where one is aimed at suppressing selected attributes.
We carried out an extensive evaluation of our proposed method using multiple datasets from different domains.
arXiv Detail & Related papers (2022-10-18T14:44:08Z) - Exploiting Unlabeled Data for Target-Oriented Opinion Words Extraction [32.98121084823483]
We propose exploiting massive unlabeled data to reduce the risk of distribution shift between test data and training data.
In this paper, we propose a novel Multi-Grained Consistency Regularization (MGCR) method to make use of unlabeled data and design two filters specifically for TOWE to filter noisy data at different granularity.
arXiv Detail & Related papers (2022-08-17T13:19:26Z) - Semi-Supervised Cascaded Clustering for Classification of Noisy Label
Data [0.3441021278275805]
The performance of supervised classification techniques often deteriorates when the data has noisy labels.
Most of the approaches addressing the noisy label data rely on deep neural networks (DNN) that require huge datasets for classification tasks.
We propose a semi-supervised cascaded clustering algorithm to extract patterns and generate a cascaded tree of classes in such datasets.
arXiv Detail & Related papers (2022-05-04T17:42:22Z) - Bayesian Semi-supervised Crowdsourcing [71.20185379303479]
Crowdsourcing has emerged as a powerful paradigm for efficiently labeling large datasets and performing various learning tasks.
This work deals with semi-supervised crowdsourced classification, under two regimes of semi-supervision.
arXiv Detail & Related papers (2020-12-20T23:18:51Z) - Correlated Differential Privacy: Feature Selection in Machine Learning [13.477069421691562]
The proposed scheme involves five steps with the goal of managing the extent of data correlation, preserving the privacy, and supporting accuracy in the prediction results.
Experiments show that the proposed scheme can produce better prediction results with machine learning tasks and fewer mean square errors for data queries compared to existing schemes.
arXiv Detail & Related papers (2020-10-07T00:33:24Z) - Proposal Learning for Semi-Supervised Object Detection [76.83284279733722]
It is non-trivial to train object detectors on unlabeled data due to the unavailability of ground truth labels.
We present a proposal learning approach to learn proposal features and predictions from both labeled and unlabeled data.
arXiv Detail & Related papers (2020-01-15T00:06:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.