Ontology-aware Learning and Evaluation for Audio Tagging
- URL: http://arxiv.org/abs/2211.12195v1
- Date: Tue, 22 Nov 2022 11:35:14 GMT
- Title: Ontology-aware Learning and Evaluation for Audio Tagging
- Authors: Haohe Liu, Qiuqiang Kong, Xubo Liu, Xinhao Mei, Wenwu Wang, Mark D.
Plumbley
- Abstract summary: Mean average precision (mAP) metric treats different kinds of sound as independent classes without considering their relations.
Ontology-aware mean average precision (OmAP) addresses the weaknesses of mAP by utilizing the AudioSet ontology information during the evaluation.
We conduct human evaluations and demonstrate that OmAP is more consistent with human perception than mAP.
- Score: 56.59107110017436
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This study defines a new evaluation metric for audio tagging tasks to
overcome the limitation of the conventional mean average precision (mAP)
metric, which treats different kinds of sound as independent classes without
considering their relations. Also, due to the ambiguities in sound labeling,
the labels in the training and evaluation set are not guaranteed to be accurate
and exhaustive, which poses challenges for robust evaluation with mAP. The
proposed metric, ontology-aware mean average precision (OmAP) addresses the
weaknesses of mAP by utilizing the AudioSet ontology information during the
evaluation. Specifically, we reweight the false positive events in the model
prediction based on the ontology graph distance to the target classes. The OmAP
measure also provides more insights into model performance by evaluations with
different coarse-grained levels in the ontology graph. We conduct human
evaluations and demonstrate that OmAP is more consistent with human perception
than mAP. To further verify the importance of utilizing the ontology
information, we also propose a novel loss function (OBCE) that reweights binary
cross entropy (BCE) loss based on the ontology distance. Our experiment shows
that OBCE can improve both mAP and OmAP metrics on the AudioSet tagging task.
Related papers
- Reproducible Machine Learning-based Voice Pathology Detection: Introducing the Pitch Difference Feature [1.1455937444848385]
We propose a robust set of features derived from a thorough research of contemporary practices in voice pathology detection.
We combine this feature set, containing data from the publicly available Saarbr"ucken Voice Database (SVD), with preprocessing using the K-Means Synthetic Minority Over-Sampling Technique algorithm.
Our approach has achieved the state-of-the-art performance, measured by unweighted average recall in voice pathology detection.
arXiv Detail & Related papers (2024-10-14T14:17:52Z) - A Comprehensive Rubric for Annotating Pathological Speech [0.0]
We introduce a comprehensive rubric based on various dimensions of speech quality, including phonetics, fluency, and prosody.
The objective is to establish standardized criteria for identifying errors within the speech of individuals with Down syndrome.
arXiv Detail & Related papers (2024-04-29T16:44:27Z) - Continual Evidential Deep Learning for Out-of-Distribution Detection [20.846788009755183]
Uncertainty-based deep learning models have attracted a great deal of interest for their ability to provide accurate and reliable predictions.
Evidential deep learning stands out achieving remarkable performance in detecting out-of-distribution (OOD) data with a single deterministic neural network.
We propose the integration of an evidential deep learning method into a continual learning framework in order to perform simultaneously incremental object classification and OOD detection.
arXiv Detail & Related papers (2023-09-06T13:36:59Z) - Learning with Noisy Labels through Learnable Weighting and Centroid Similarity [5.187216033152917]
noisy labels are prevalent in domains such as medical diagnosis and autonomous driving.
We introduce a novel method for training machine learning models in the presence of noisy labels.
Our results show that our method consistently outperforms the existing state-of-the-art techniques.
arXiv Detail & Related papers (2023-03-16T16:43:24Z) - Leveraging Pretrained Representations with Task-related Keywords for
Alzheimer's Disease Detection [69.53626024091076]
Alzheimer's disease (AD) is particularly prominent in older adults.
Recent advances in pre-trained models motivate AD detection modeling to shift from low-level features to high-level representations.
This paper presents several efficient methods to extract better AD-related cues from high-level acoustic and linguistic features.
arXiv Detail & Related papers (2023-03-14T16:03:28Z) - Uncertainty Estimation by Fisher Information-based Evidential Deep
Learning [61.94125052118442]
Uncertainty estimation is a key factor that makes deep learning reliable in practical applications.
We propose a novel method, Fisher Information-based Evidential Deep Learning ($mathcalI$-EDL)
In particular, we introduce Fisher Information Matrix (FIM) to measure the informativeness of evidence carried by each sample, according to which we can dynamically reweight the objective loss terms to make the network more focused on the representation learning of uncertain classes.
arXiv Detail & Related papers (2023-03-03T16:12:59Z) - Evaluating generative audio systems and their metrics [80.97828572629093]
This paper investigates state-of-the-art approaches side-by-side with (i) a set of previously proposed objective metrics for audio reconstruction, and (ii) a listening study.
Results indicate that currently used objective metrics are insufficient to describe the perceptual quality of current systems.
arXiv Detail & Related papers (2022-08-31T21:48:34Z) - Proximal Reinforcement Learning: Efficient Off-Policy Evaluation in
Partially Observed Markov Decision Processes [65.91730154730905]
In applications of offline reinforcement learning to observational data, such as in healthcare or education, a general concern is that observed actions might be affected by unobserved factors.
Here we tackle this by considering off-policy evaluation in a partially observed Markov decision process (POMDP)
We extend the framework of proximal causal inference to our POMDP setting, providing a variety of settings where identification is made possible.
arXiv Detail & Related papers (2021-10-28T17:46:14Z) - Attention-based Neural Bag-of-Features Learning for Sequence Data [143.62294358378128]
2D-Attention (2DA) is a generic attention formulation for sequence data.
The proposed attention module is incorporated into the recently proposed Neural Bag of Feature (NBoF) model to enhance its learning capacity.
Our empirical analysis shows that the proposed attention formulations can not only improve performances of NBoF models but also make them resilient to noisy data.
arXiv Detail & Related papers (2020-05-25T17:51:54Z) - Exploration of Audio Quality Assessment and Anomaly Localisation Using
Attention Models [37.60722440434528]
In this paper, a novel model for audio quality assessment is proposed by jointly using bidirectional long short-term memory and an attention mechanism.
The former is to mimic a human auditory perception ability to learn information from a recording, and the latter is to further discriminate interferences from desired signals by highlighting target related features.
To evaluate our proposed approach, the TIMIT dataset is used and augmented by mixing with various natural sounds.
arXiv Detail & Related papers (2020-05-16T17:54:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.