Feature-Rich Audio Model Inversion for Data-Free Knowledge Distillation
Towards General Sound Classification
- URL: http://arxiv.org/abs/2303.07643v1
- Date: Tue, 14 Mar 2023 06:04:19 GMT
- Title: Feature-Rich Audio Model Inversion for Data-Free Knowledge Distillation
Towards General Sound Classification
- Authors: Zuheng Kang, Yayun He, Jianzong Wang, Junqing Peng, Xiaoyang Qu, Jing
Xiao
- Abstract summary: We propose feature-rich audio model inversion (FRAMI), a data-free knowledge distillation framework for general sound classification tasks.
Experimental results on the Urbansound8k, ESC-50, and audioMNIST datasets demonstrate that FRAMI can generate feature-rich samples.
- Score: 23.35582432472955
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Data-Free Knowledge Distillation (DFKD) has recently attracted growing
attention in the academic community, especially with major breakthroughs in
computer vision. Despite promising results, the technique has not been well
applied to audio and signal processing. Due to the variable duration of audio
signals, it has its own unique way of modeling. In this work, we propose
feature-rich audio model inversion (FRAMI), a data-free knowledge distillation
framework for general sound classification tasks. It first generates
high-quality and feature-rich Mel-spectrograms through a feature-invariant
contrastive loss. Then, the hidden states before and after the statistics
pooling layer are reused when knowledge distillation is performed on these
feature-rich samples. Experimental results on the Urbansound8k, ESC-50, and
audioMNIST datasets demonstrate that FRAMI can generate feature-rich samples.
Meanwhile, the accuracy of the student model is further improved by reusing the
hidden state and significantly outperforms the baseline method.
Related papers
- A Novel Score-CAM based Denoiser for Spectrographic Signature Extraction without Ground Truth [0.0]
This paper develops a novel Score-CAM based denoiser to extract an object's signature from noisy spectrographic data.
In particular, this paper proposes a novel generative adversarial network architecture for learning and producing spectrographic training data.
arXiv Detail & Related papers (2024-10-28T21:40:46Z) - Real Acoustic Fields: An Audio-Visual Room Acoustics Dataset and Benchmark [65.79402756995084]
Real Acoustic Fields (RAF) is a new dataset that captures real acoustic room data from multiple modalities.
RAF is the first dataset to provide densely captured room acoustic data.
arXiv Detail & Related papers (2024-03-27T17:59:56Z) - DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception [78.26734070960886]
Current perceptive models heavily depend on resource-intensive datasets.
We introduce perception-aware loss (P.A. loss) through segmentation, improving both quality and controllability.
Our method customizes data augmentation by extracting and utilizing perception-aware attribute (P.A. Attr) during generation.
arXiv Detail & Related papers (2024-03-20T04:58:03Z) - Learning with Noisy Foundation Models [95.50968225050012]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets.
We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z) - Exploring Meta Information for Audio-based Zero-shot Bird Classification [113.17261694996051]
This study investigates how meta-information can improve zero-shot audio classification.
We use bird species as an example case study due to the availability of rich and diverse meta-data.
arXiv Detail & Related papers (2023-09-15T13:50:16Z) - Leveraging Pre-trained AudioLDM for Sound Generation: A Benchmark Study [33.10311742703679]
We make the first attempt to investigate the benefits of pre-training on sound generation with AudioLDM.
Our study demonstrates the advantages of the pre-trained AudioLDM, especially in data-scarcity scenarios.
We benchmark the sound generation task on various frequently-used datasets.
arXiv Detail & Related papers (2023-03-07T12:49:45Z) - Decision Forest Based EMG Signal Classification with Low Volume Dataset
Augmented with Random Variance Gaussian Noise [51.76329821186873]
We produce a model that can classify six different hand gestures with a limited number of samples that generalizes well to a wider audience.
We appeal to a set of more elementary methods such as the use of random bounds on a signal, but desire to show the power these methods can carry in an online setting.
arXiv Detail & Related papers (2022-06-29T23:22:18Z) - Contrastive Model Inversion for Data-Free Knowledge Distillation [60.08025054715192]
We propose Contrastive Model Inversion, where the data diversity is explicitly modeled as an optimizable objective.
Our main observation is that, under the constraint of the same amount of data, higher data diversity usually indicates stronger instance discrimination.
Experiments on CIFAR-10, CIFAR-100, and Tiny-ImageNet demonstrate that CMI achieves significantly superior performance when the generated data are used for knowledge distillation.
arXiv Detail & Related papers (2021-05-18T15:13:00Z) - SoundCLR: Contrastive Learning of Representations For Improved
Environmental Sound Classification [0.6767885381740952]
SoundCLR is a supervised contrastive learning method for effective environment sound classification with state-of-the-art performance.
Due to the comparatively small sizes of the available environmental sound datasets, we propose and exploit a transfer learning and strong data augmentation pipeline.
Our experiments show that our masking based augmentation technique on the log-mel spectrograms can significantly improve the recognition performance.
arXiv Detail & Related papers (2021-03-02T18:42:45Z) - High-Fidelity Audio Generation and Representation Learning with Guided
Adversarial Autoencoder [2.6770746621108654]
We propose a new autoencoder based model named "Guided Adversarial Autoencoder (GAAE)"
Our proposed model can generate audio with superior quality, which is indistinguishable from the real audio samples.
arXiv Detail & Related papers (2020-06-01T12:19:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.