PosCUDA: Position based Convolution for Unlearnable Audio Datasets
- URL: http://arxiv.org/abs/2401.02135v1
- Date: Thu, 4 Jan 2024 08:39:49 GMT
- Title: PosCUDA: Position based Convolution for Unlearnable Audio Datasets
- Authors: Vignesh Gokul, Shlomo Dubnov
- Abstract summary: PosCUDA is a position based convolution for creating unlearnable audio datasets.
We empirically show that PosCUDA can achieve unlearnability while maintaining the quality of the original audio datasets.
- Score: 7.4768400786925175
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep learning models require large amounts of clean data to acheive good
performance. To avoid the cost of expensive data acquisition, researchers use
the abundant data available on the internet. This raises significant privacy
concerns on the potential misuse of personal data for model training without
authorisation. Recent works such as CUDA propose solutions to this problem by
adding class-wise blurs to make datasets unlearnable, i.e a model can never use
the acquired dataset for learning. However these methods often reduce the
quality of the data making it useless for practical applications. We introduce
PosCUDA, a position based convolution for creating unlearnable audio datasets.
PosCUDA uses class-wise convolutions on small patches of audio. The location of
the patches are based on a private key for each class, hence the model learns
the relations between positional blurs and labels, while failing to generalize.
We empirically show that PosCUDA can achieve unlearnability while maintaining
the quality of the original audio datasets. Our proposed method is also robust
to different audio feature representations such as MFCC, raw audio and
different architectures such as transformers, convolutional networks etc.
Related papers
- Learning from Convolution-based Unlearnable Datastes [5.332412565926725]
The Conlearn-based Unlearnable DAtaset (CUDA) method aims to make data unlearnable by applying class-wise blurs to every image in the dataset.
In this work, we evaluate whether data remains unlearnable after image sharpening and frequency filtering.
We observe a substantial increase in test accuracy over adversarial training for models trained with unlearnable data.
arXiv Detail & Related papers (2024-11-04T01:51:50Z) - Scaling Laws for Data Filtering -- Data Curation cannot be Compute Agnostic [99.3682210827572]
Vision-language models (VLMs) are trained for thousands of GPU hours on carefully curated web datasets.
Data curation strategies are typically developed agnostic of the available compute for training.
We introduce neural scaling laws that account for the non-homogeneous nature of web data.
arXiv Detail & Related papers (2024-04-10T17:27:54Z) - Learn to Unlearn for Deep Neural Networks: Minimizing Unlearning
Interference with Gradient Projection [56.292071534857946]
Recent data-privacy laws have sparked interest in machine unlearning.
Challenge is to discard information about the forget'' data without altering knowledge about remaining dataset.
We adopt a projected-gradient based learning method, named as Projected-Gradient Unlearning (PGU)
We provide empirically evidence to demonstrate that our unlearning method can produce models that behave similar to models retrained from scratch across various metrics even when the training dataset is no longer accessible.
arXiv Detail & Related papers (2023-12-07T07:17:24Z) - Dataset Quantization [72.61936019738076]
We present dataset quantization (DQ), a new framework to compress large-scale datasets into small subsets.
DQ is the first method that can successfully distill large-scale datasets such as ImageNet-1k with a state-of-the-art compression ratio.
arXiv Detail & Related papers (2023-08-21T07:24:29Z) - Exploring Data Redundancy in Real-world Image Classification through
Data Selection [20.389636181891515]
Deep learning models often require large amounts of data for training, leading to increased costs.
We present two data valuation metrics based on Synaptic Intelligence and gradient norms, respectively, to study redundancy in real-world image data.
Online and offline data selection algorithms are then proposed via clustering and grouping based on the examined data values.
arXiv Detail & Related papers (2023-06-25T03:31:05Z) - Few-Shot Non-Parametric Learning with Deep Latent Variable Model [50.746273235463754]
We propose Non-Parametric learning by Compression with Latent Variables (NPC-LV)
NPC-LV is a learning framework for any dataset with abundant unlabeled data but very few labeled ones.
We show that NPC-LV outperforms supervised methods on all three datasets on image classification in low data regime.
arXiv Detail & Related papers (2022-06-23T09:35:03Z) - Learning from Multiple Noisy Augmented Data Sets for Better
Cross-Lingual Spoken Language Understanding [69.40915115518523]
Lack of training data presents a grand challenge to scaling out spoken language understanding (SLU) to low-resource languages.
Various data augmentation approaches have been proposed to synthesize training data in low-resource target languages.
In this paper we focus on mitigating noise in augmented data.
arXiv Detail & Related papers (2021-09-03T15:44:15Z) - Federated Self-Training for Semi-Supervised Audio Recognition [0.23633885460047763]
In this work, we study the problem of semi-supervised learning of audio models via self-training.
We propose FedSTAR to exploit large-scale on-device unlabeled data to improve the generalization of audio recognition models.
arXiv Detail & Related papers (2021-07-14T17:40:10Z) - Learning with Out-of-Distribution Data for Audio Classification [60.48251022280506]
We show that detecting and relabelling certain OOD instances, rather than discarding them, can have a positive effect on learning.
The proposed method is shown to improve the performance of convolutional neural networks by a significant margin.
arXiv Detail & Related papers (2020-02-11T21:08:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.