A Demand-Driven Perspective on Generative Audio AI
- URL: http://arxiv.org/abs/2307.04292v1
- Date: Mon, 10 Jul 2023 00:58:28 GMT
- Title: A Demand-Driven Perspective on Generative Audio AI
- Authors: Sangshin Oh, Minsung Kang, Hyeongi Moon, Keunwoo Choi, Ben Sangbae
Chon
- Abstract summary: In this paper, we present the results of a survey conducted with professional audio engineers.
We summarize the current challenges in audio quality and controllability based on the survey.
- Score: 1.0639605996067534
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: To achieve successful deployment of AI research, it is crucial to understand
the demands of the industry. In this paper, we present the results of a survey
conducted with professional audio engineers, in order to determine research
priorities and define various research tasks. We also summarize the current
challenges in audio quality and controllability based on the survey. Our
analysis emphasizes that the availability of datasets is currently the main
bottleneck for achieving high-quality audio generation. Finally, we suggest
potential solutions for some revealed issues with empirical evidence.
Related papers
- Audio Anti-Spoofing Detection: A Survey [7.3348524333159]
Deep learning has given rise to sophisticated algorithms capable of manipulating or creating multimedia fake content, known as Deepfake.
Audio anti-spoofing detection challenges have been organized to foster the development of anti-spoofing countermeasures.
This survey paper presents a comprehensive review of every component within the detection pipeline, including algorithm architectures, optimization techniques, application generalizability, evaluation metrics, performance comparisons, available datasets, and open-source availability.
arXiv Detail & Related papers (2024-04-22T06:52:12Z) - SurveyAgent: A Conversational System for Personalized and Efficient Research Survey [50.04283471107001]
This paper introduces SurveyAgent, a novel conversational system designed to provide personalized and efficient research survey assistance to researchers.
SurveyAgent integrates three key modules: Knowledge Management for organizing papers, Recommendation for discovering relevant literature, and Query Answering for engaging with content on a deeper level.
Our evaluation demonstrates SurveyAgent's effectiveness in streamlining research activities, showcasing its capability to facilitate how researchers interact with scientific literature.
arXiv Detail & Related papers (2024-04-09T15:01:51Z) - An experiment on an automated literature survey of data-driven speech
enhancement methods [5.931978628000179]
This work explores the use of a generative pre-trained transformer (GPT) model to automate a literature survey of 116 articles on data-driven speech enhancement methods.
arXiv Detail & Related papers (2023-10-10T02:07:24Z) - A Survey on Interpretable Cross-modal Reasoning [64.37362731950843]
Cross-modal reasoning (CMR) has emerged as a pivotal area with applications spanning from multimedia analysis to healthcare diagnostics.
This survey delves into the realm of interpretable cross-modal reasoning (I-CMR)
This survey presents a comprehensive overview of the typical methods with a three-level taxonomy for I-CMR.
arXiv Detail & Related papers (2023-09-05T05:06:48Z) - Improving the State of the Art for Training Human-AI Teams: Technical
Report #2 -- Results of Researcher Knowledge Elicitation Survey [0.0]
Sonalysts has begun an internal initiative to explore the training of Human-AI teams.
The first step in this effort is to develop a Synthetic Task Environment (STE) that is capable of facilitating research on Human-AI teams.
arXiv Detail & Related papers (2023-08-29T13:54:32Z) - Leveraging Pre-trained AudioLDM for Sound Generation: A Benchmark Study [51.42020333199243]
We make the first attempt to investigate the benefits of pre-training on sound generation with AudioLDM.
Our study demonstrates the advantages of the pre-trained AudioLDM, especially in data-scarcity scenarios.
We benchmark the sound generation task on various frequently-used datasets.
arXiv Detail & Related papers (2023-03-07T12:49:45Z) - Learning to Answer Questions in Dynamic Audio-Visual Scenarios [81.19017026999218]
We focus on the Audio-Visual Questioning (AVQA) task, which aims to answer questions regarding different visual objects sounds, and their associations in videos.
Our dataset contains more than 45K question-answer pairs spanning over different modalities and question types.
Our results demonstrate that AVQA benefits from multisensory perception and our model outperforms recent A-SIC, V-SIC, and AVQA approaches.
arXiv Detail & Related papers (2022-03-26T13:03:42Z) - Scaling up Search Engine Audits: Practical Insights for Algorithm
Auditing [68.8204255655161]
We set up experiments for eight search engines with hundreds of virtual agents placed in different regions.
We demonstrate the successful performance of our research infrastructure across multiple data collections.
We conclude that virtual agents are a promising venue for monitoring the performance of algorithms across long periods of time.
arXiv Detail & Related papers (2021-06-10T15:49:58Z) - Artificial Intelligence for IT Operations (AIOPS) Workshop White Paper [50.25428141435537]
Artificial Intelligence for IT Operations (AIOps) is an emerging interdisciplinary field arising in the intersection between machine learning, big data, streaming analytics, and the management of IT operations.
Main aim of the AIOPS workshop is to bring together researchers from both academia and industry to present their experiences, results, and work in progress in this field.
arXiv Detail & Related papers (2021-01-15T10:43:10Z) - Questionnaire analysis to define the most suitable survey for port-noise
investigation [0.0]
The paper analyses a sample of questions suitable for the specific research, chosen as part of the wide database of questionnaires internationally proposed for subjective investigations.
The questionnaire will be optimized to be distributed in the TRIPLO project (TRansports and Innovative sustainable connections between Ports and LOgistic platforms)
arXiv Detail & Related papers (2020-07-14T08:52:55Z) - Exploration of Audio Quality Assessment and Anomaly Localisation Using
Attention Models [37.60722440434528]
In this paper, a novel model for audio quality assessment is proposed by jointly using bidirectional long short-term memory and an attention mechanism.
The former is to mimic a human auditory perception ability to learn information from a recording, and the latter is to further discriminate interferences from desired signals by highlighting target related features.
To evaluate our proposed approach, the TIMIT dataset is used and augmented by mixing with various natural sounds.
arXiv Detail & Related papers (2020-05-16T17:54:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.