Practical aspects for the creation of an audio dataset from field recordings with optimized labeling budget with AI-assisted strategy
- URL: http://arxiv.org/abs/2405.18153v1
- Date: Tue, 28 May 2024 13:14:26 GMT
- Title: Practical aspects for the creation of an audio dataset from field recordings with optimized labeling budget with AI-assisted strategy
- Authors: Javier Naranjo-Alcazar, Jordi Grau-Haro, Ruben Ribes-Serrano, Pedro Zuccarello,
- Abstract summary: The paper emphasizes the importance of Active Learning (AL) using expert labelers over crowdsourcing.
AL is an iterative process combining human labelers and AI models to optimize the labeling budget.
The framework successfully labeled 6540 ten-second audio samples over five months with a small team.
- Score: 0.42855555838080833
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine Listening focuses on developing technologies to extract relevant information from audio signals. A critical aspect of these projects is the acquisition and labeling of contextualized data, which is inherently complex and requires specific resources and strategies. Despite the availability of some audio datasets, many are unsuitable for commercial applications. The paper emphasizes the importance of Active Learning (AL) using expert labelers over crowdsourcing, which often lacks detailed insights into dataset structures. AL is an iterative process combining human labelers and AI models to optimize the labeling budget by intelligently selecting samples for human review. This approach addresses the challenge of handling large, constantly growing datasets that exceed available computational resources and memory. The paper presents a comprehensive data-centric framework for Machine Listening projects, detailing the configuration of recording nodes, database structure, and labeling budget optimization in resource-constrained scenarios. Applied to an industrial port in Valencia, Spain, the framework successfully labeled 6540 ten-second audio samples over five months with a small team, demonstrating its effectiveness and adaptability to various resource availability situations.
Related papers
- Learning From Crowdsourced Noisy Labels: A Signal Processing Perspective [42.24248330317496]
This feature article introduces advances in learning from noisy crowdsourced labels.
The focus is on key crowdsourcing models and their methodological treatments, from classical statistical models to recent deep learning-based approaches.
In particular, this article reviews the connections between signal processing (SP) theory and methods, such as identifiability of tensor and nonnegative matrix factorization.
arXiv Detail & Related papers (2024-07-09T14:34:40Z) - DACO: Towards Application-Driven and Comprehensive Data Analysis via
Code Generation [86.4326416303723]
Data analysis is a crucial analytical process to generate in-depth studies and conclusive insights.
We propose to automatically generate high-quality answer annotations leveraging the code-generation capabilities of LLMs.
Our DACO-RL algorithm is evaluated by human annotators to produce more helpful answers than SFT model in 57.72% cases.
arXiv Detail & Related papers (2024-03-04T22:47:58Z) - AQUALLM: Audio Question Answering Data Generation Using Large Language
Models [2.2232550112727267]
We introduce a scalable AQA data generation pipeline, which relies on Large Language Models (LLMs)
We present three extensive and high-quality benchmark datasets for AQA.
Models trained on our datasets demonstrate enhanced generalizability when compared to models trained using human-annotated AQA data.
arXiv Detail & Related papers (2023-12-28T20:01:27Z) - A Large-scale Dataset for Audio-Language Representation Learning [54.933479346870506]
We present an innovative and automatic audio caption generation pipeline based on a series of public tools or APIs.
We construct a large-scale, high-quality, audio-language dataset, named as Auto-ACD, comprising over 1.9M audio-text pairs.
arXiv Detail & Related papers (2023-09-20T17:59:32Z) - Deep Active Audio Feature Learning in Resource-Constrained Environments [3.789219860006095]
The scarcity of labelled data makes training Deep Neural Network (DNN) models in bioacoustic applications challenging.
Active Learning (AL) is an approach that can help with this learning while requiring little labelling effort.
We describe an AL framework that addresses this issue by incorporating feature extraction into the AL loop and refining the feature extractor after each round of manual annotation.
arXiv Detail & Related papers (2023-08-25T06:45:02Z) - Extreme Multi-Label Skill Extraction Training using Large Language
Models [19.095612333241288]
We describe a cost-effective approach to generate an accurate, fully synthetic labeled dataset for skill extraction.
Our results show a consistent increase of between 15 to 25 percentage points in textitR-Precision@5 compared to previously published results.
arXiv Detail & Related papers (2023-07-20T11:29:15Z) - AUGUST: an Automatic Generation Understudy for Synthesizing
Conversational Recommendation Datasets [56.052803235932686]
We propose a novel automatic dataset synthesis approach that can generate both large-scale and high-quality recommendation dialogues.
In doing so, we exploit: (i) rich personalized user profiles from traditional recommendation datasets, (ii) rich external knowledge from knowledge graphs, and (iii) the conversation ability contained in human-to-human conversational recommendation datasets.
arXiv Detail & Related papers (2023-06-16T05:27:14Z) - infoVerse: A Universal Framework for Dataset Characterization with
Multidimensional Meta-information [68.76707843019886]
infoVerse is a universal framework for dataset characterization.
infoVerse captures multidimensional characteristics of datasets by incorporating various model-driven meta-information.
In three real-world applications (data pruning, active learning, and data annotation), the samples chosen on infoVerse space consistently outperform strong baselines.
arXiv Detail & Related papers (2023-05-30T18:12:48Z) - Reinforced Iterative Knowledge Distillation for Cross-Lingual Named
Entity Recognition [54.92161571089808]
Cross-lingual NER transfers knowledge from rich-resource language to languages with low resources.
Existing cross-lingual NER methods do not make good use of rich unlabeled data in target languages.
We develop a novel approach based on the ideas of semi-supervised learning and reinforcement learning.
arXiv Detail & Related papers (2021-06-01T05:46:22Z) - Active Learning for Noisy Data Streams Using Weak and Strong Labelers [3.9370369973510746]
We consider a novel weak and strong labeler problem inspired by humans natural ability for labeling.
We propose an on-line active learning algorithm that consists of four steps: filtering, adding diversity, informative sample selection, and labeler selection.
We derive a decision function that measures the information gain by combining the informativeness of individual samples and model confidence.
arXiv Detail & Related papers (2020-10-27T09:18:35Z) - Adaptive Self-training for Few-shot Neural Sequence Labeling [55.43109437200101]
We develop techniques to address the label scarcity challenge for neural sequence labeling models.
Self-training serves as an effective mechanism to learn from large amounts of unlabeled data.
meta-learning helps in adaptive sample re-weighting to mitigate error propagation from noisy pseudo-labels.
arXiv Detail & Related papers (2020-10-07T22:29:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.