A Survey of Recent Advances and Challenges in Deep Audio-Visual Correlation Learning
- URL: http://arxiv.org/abs/2412.00049v1
- Date: Sun, 24 Nov 2024 03:26:34 GMT
- Title: A Survey of Recent Advances and Challenges in Deep Audio-Visual Correlation Learning
- Authors: Luis Vilaca, Yi Yu, Paula Vinan,
- Abstract summary: Audio-visual correlation learning aims to capture and understand natural phenomena between audio and visual data.
The rapid growth of Deep Learning propelled the development of proposals that process audio-visual data.
We provide a summarization of the recent progress of Audio-Visual Correlation Learning and discuss the future research directions.
- Score: 6.595840767689357
- License:
- Abstract: Audio-visual correlation learning aims to capture and understand natural phenomena between audio and visual data. The rapid growth of Deep Learning propelled the development of proposals that process audio-visual data and can be observed in the number of proposals in the past years. Thus encouraging the development of a comprehensive survey. Besides analyzing the models used in this context, we also discuss some tasks of definition and paradigm applied in AI multimedia. In addition, we investigate objective functions frequently used and discuss how audio-visual data is exploited in the optimization process, i.e., the different methodologies for representing knowledge in the audio-visual domain. In fact, we focus on how human-understandable mechanisms, i.e., structured knowledge that reflects comprehensible knowledge, can guide the learning process. Most importantly, we provide a summarization of the recent progress of Audio-Visual Correlation Learning (AVCL) and discuss the future research directions.
Related papers
- Meta-Learning in Audio and Speech Processing: An End to End Comprehensive Review [0.0]
We present a systematic review of meta-learning methodologies in audio processing.
This includes audio-specific discussions on data augmentation, feature extraction, preprocessing techniques, meta-learners, task selection strategies.
We aim to provide valuable insights and identify future research directions in the intersection of meta-learning and audio processing.
arXiv Detail & Related papers (2024-08-19T18:11:59Z) - Learning in Audio-visual Context: A Review, Analysis, and New
Perspective [88.40519011197144]
This survey aims to systematically organize and analyze studies of the audio-visual field.
We introduce several key findings that have inspired our computational studies.
We propose a new perspective on audio-visual scene understanding, then discuss and analyze the feasible future direction of the audio-visual learning area.
arXiv Detail & Related papers (2022-08-20T02:15:44Z) - Deep Learning for Visual Speech Analysis: A Survey [54.53032361204449]
This paper presents a review of recent progress in deep learning methods on visual speech analysis.
We cover different aspects of visual speech, including fundamental problems, challenges, benchmark datasets, a taxonomy of existing methods, and state-of-the-art performance.
arXiv Detail & Related papers (2022-05-22T14:44:53Z) - Audio Self-supervised Learning: A Survey [60.41768569891083]
Self-Supervised Learning (SSL) targets at discovering general representations from large-scale data without requiring human annotations.
Its success in the fields of computer vision and natural language processing have prompted its recent adoption into the field of audio and speech processing.
arXiv Detail & Related papers (2022-03-02T15:58:29Z) - Recent Advances and Challenges in Deep Audio-Visual Correlation Learning [7.273353828127817]
This paper focuses on state-of-the-art (SOTA) models used to learn correlations between audio and video.
We also discuss some tasks of definition and paradigm applied in AI multimedia.
arXiv Detail & Related papers (2022-02-28T10:43:01Z) - Distilling Audio-Visual Knowledge by Compositional Contrastive Learning [51.20935362463473]
We learn a compositional embedding that closes the cross-modal semantic gap.
We establish a new, comprehensive multi-modal distillation benchmark on three video datasets.
arXiv Detail & Related papers (2021-04-22T09:31:20Z) - An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and
Separation [57.68765353264689]
Speech enhancement and speech separation are two related tasks.
Traditionally, these tasks have been tackled using signal processing and machine learning techniques.
Deep learning has been exploited to achieve strong performance.
arXiv Detail & Related papers (2020-08-21T17:24:09Z) - Deep Audio-Visual Learning: A Survey [53.487938108404244]
We divide the current audio-visual learning tasks into four different subfields.
We discuss state-of-the-art methods as well as the remaining challenges of each subfield.
We summarize the commonly used datasets and performance metrics.
arXiv Detail & Related papers (2020-01-14T13:11:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.