CURE Dataset: Ladder Networks for Audio Event Classification
- URL: http://arxiv.org/abs/2001.03896v1
- Date: Sun, 12 Jan 2020 09:35:30 GMT
- Title: CURE Dataset: Ladder Networks for Audio Event Classification
- Authors: Harishchandra Dubey, Dimitra Emmanouilidou, Ivan J. Tashev
- Abstract summary: There are approximately 3M people with hearing loss who can't perceive events happening around them.
This paper establishes the CURE dataset which contains curated set of specific audio events most relevant for people with hearing loss.
- Score: 15.850545634216484
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Audio event classification is an important task for several applications such
as surveillance, audio, video and multimedia retrieval etc. There are
approximately 3M people with hearing loss who can't perceive events happening
around them. This paper establishes the CURE dataset which contains curated set
of specific audio events most relevant for people with hearing loss. We propose
a ladder network based audio event classifier that utilizes 5s sound recordings
derived from the Freesound project. We adopted the state-of-the-art
convolutional neural network (CNN) embeddings as audio features for this task.
We also investigate extreme learning machine (ELM) for event classification. In
this study, proposed classifiers are compared with support vector machine (SVM)
baseline. We propose signal and feature normalization that aims to reduce the
mismatch between different recordings scenarios. Firstly, CNN is trained on
weakly labeled Audioset data. Next, the pre-trained model is adopted as feature
extractor for proposed CURE corpus. We incorporate ESC-50 dataset as second
evaluation set. Results and discussions validate the superiority of Ladder
network over ELM and SVM classifier in terms of robustness and increased
classification accuracy. While Ladder network is robust to data mismatches,
simpler SVM and ELM classifiers are sensitive to such mismatches, where the
proposed normalization techniques can play an important role. Experimental
studies with ESC-50 and CURE corpora elucidate the differences in dataset
complexity and robustness offered by proposed approaches.
Related papers
- HAVE-Net: Hallucinated Audio-Visual Embeddings for Few-Shot
Classification with Unimodal Cues [19.800985243540797]
Occlusion, intra-class variance, lighting, etc., might arise while training neural networks using unimodal RS visual input.
We propose a novel few-shot generative framework, Hallucinated Audio-Visual Embeddings-Network (HAVE-Net), to meta-train cross-modal features from limited unimodal data.
arXiv Detail & Related papers (2023-09-23T20:05:00Z) - LEAN: Light and Efficient Audio Classification Network [1.5070398746522742]
We propose a lightweight on-device deep learning-based model for audio classification, LEAN.
LEAN consists of a raw waveform-based temporal feature extractor called as Wave realignment and logmel-based Pretrained YAMNet.
We show that using a combination of trainable wave encoder, Pretrained YAMNet along with cross attention-based temporal realignment, results in competitive performance on downstream audio classification tasks with lesser memory footprints.
arXiv Detail & Related papers (2023-05-22T04:45:04Z) - Audio-Visual Efficient Conformer for Robust Speech Recognition [91.3755431537592]
We propose to improve the noise of the recently proposed Efficient Conformer Connectionist Temporal Classification architecture by processing both audio and visual modalities.
Our experiments show that using audio and visual modalities allows to better recognize speech in the presence of environmental noise and significantly accelerate training, reaching lower WER with 4 times less training steps.
arXiv Detail & Related papers (2023-01-04T05:36:56Z) - SLICER: Learning universal audio representations using low-resource
self-supervised pre-training [53.06337011259031]
We present a new Self-Supervised Learning approach to pre-train encoders on unlabeled audio data.
Our primary aim is to learn audio representations that can generalize across a large variety of speech and non-speech tasks.
arXiv Detail & Related papers (2022-11-02T23:45:33Z) - Deep Feature Learning for Medical Acoustics [78.56998585396421]
The purpose of this paper is to compare different learnables in medical acoustics tasks.
A framework has been implemented to classify human respiratory sounds and heartbeats in two categories, i.e. healthy or affected by pathologies.
arXiv Detail & Related papers (2022-08-05T10:39:37Z) - Segment-level Metric Learning for Few-shot Bioacoustic Event Detection [56.59107110017436]
We propose a segment-level few-shot learning framework that utilizes both the positive and negative events during model optimization.
Our system achieves an F-measure of 62.73 on the DCASE 2022 challenge task 5 (DCASE2022-T5) validation set, outperforming the performance of the baseline prototypical network 34.02 by a large margin.
arXiv Detail & Related papers (2022-07-15T22:41:30Z) - Speaker Embedding-aware Neural Diarization: a Novel Framework for
Overlapped Speech Diarization in the Meeting Scenario [51.5031673695118]
We reformulate overlapped speech diarization as a single-label prediction problem.
We propose the speaker embedding-aware neural diarization (SEND) system.
arXiv Detail & Related papers (2022-03-18T06:40:39Z) - Robust Feature Learning on Long-Duration Sounds for Acoustic Scene
Classification [54.57150493905063]
Acoustic scene classification (ASC) aims to identify the type of scene (environment) in which a given audio signal is recorded.
We propose a robust feature learning (RFL) framework to train the CNN.
arXiv Detail & Related papers (2021-08-11T03:33:05Z) - A Study of Few-Shot Audio Classification [2.1989764549743476]
Few-shot learning is a type of machine learning designed to enable the model to generalize to new classes with very few examples.
We evaluate our model for speaker identification on the VoxCeleb dataset and ICSI Meeting Corpus, obtaining 5-shot 5-way accuracies of 93.5% and 54.0%, respectively.
We also evaluate for activity classification from audio using few-shot subsets of the Kinetics600 dataset and AudioSet, both drawn from Youtube videos, obtaining 51.5% and 35.2% accuracy, respectively.
arXiv Detail & Related papers (2020-12-02T22:19:16Z) - An Ensemble of Convolutional Neural Networks for Audio Classification [9.174145063580882]
ensembles of CNNs for audio classification are presented and tested on three freely available audio classification datasets.
To the best of our knowledge, this is the most extensive study investigating ensembles of CNNs for audio classification.
arXiv Detail & Related papers (2020-07-15T19:41:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.