Neural Speech and Audio Coding: Modern AI Technology Meets Traditional Codecs
- URL: http://arxiv.org/abs/2408.06954v2
- Date: Tue, 07 Jan 2025 04:11:55 GMT
- Title: Neural Speech and Audio Coding: Modern AI Technology Meets Traditional Codecs
- Authors: Minje Kim, Jan Skoglund,
- Abstract summary: The paper explores the integration of model-based and data-driven approaches within the realm of neural speech and audio coding systems.
It introduces a neural network-based signal enhancer designed to post-process existing codecs' output.
The paper examines the use of psychoacoustically calibrated loss functions to train end-to-end neural audio codecs.
- Score: 19.437080345021105
- License:
- Abstract: This paper explores the integration of model-based and data-driven approaches within the realm of neural speech and audio coding systems. It highlights the challenges posed by the subjective evaluation processes of speech and audio codecs and discusses the limitations of purely data-driven approaches, which often require inefficiently large architectures to match the performance of model-based methods. The study presents hybrid systems as a viable solution, offering significant improvements to the performance of conventional codecs through meticulously chosen design enhancements. Specifically, it introduces a neural network-based signal enhancer designed to post-process existing codecs' output, along with the autoencoder-based end-to-end models and LPCNet--hybrid systems that combine linear predictive coding (LPC) with neural networks. Furthermore, the paper delves into predictive models operating within custom feature spaces (TF-Codec) or predefined transform domains (MDCTNet) and examines the use of psychoacoustically calibrated loss functions to train end-to-end neural audio codecs. Through these investigations, the paper demonstrates the potential of hybrid systems to advance the field of speech and audio coding by bridging the gap between traditional model-based approaches and modern data-driven techniques.
Related papers
- Efficient Evaluation of Quantization-Effects in Neural Codecs [4.897318643396687]
Training neural codecs requires techniques to allow a non-zero gradient across the quantizer.
This paper proposes an efficient evaluation framework for neural codecs using simulated data.
We validate our findings against an internal neural audio gradient and against the state-of-the-art descript-audio-codec.
arXiv Detail & Related papers (2025-02-07T09:11:19Z) - Quantum-Trained Convolutional Neural Network for Deepfake Audio Detection [3.2927352068925444]
deepfake technologies pose challenges to privacy, security, and information integrity.
This paper introduces a Quantum-Trained Convolutional Neural Network framework designed to enhance the detection of deepfake audio.
arXiv Detail & Related papers (2024-10-11T20:52:10Z) - SONAR: A Synthetic AI-Audio Detection Framework and Benchmark [59.09338266364506]
SONAR is a synthetic AI-Audio Detection Framework and Benchmark.
It aims to provide a comprehensive evaluation for distinguishing cutting-edge AI-synthesized auditory content.
It is the first framework to uniformly benchmark AI-audio detection across both traditional and foundation model-based deepfake detection systems.
arXiv Detail & Related papers (2024-10-06T01:03:42Z) - Neural Harmonium: An Interpretable Deep Structure for Nonlinear Dynamic
System Identification with Application to Audio Processing [4.599180419117645]
Interpretability helps us understand a model's ability to generalize and reveal its limitations.
We introduce a causal interpretable deep structure for modeling dynamic systems.
Our proposed model makes use of the harmonic analysis by modeling the system in a time-frequency domain.
arXiv Detail & Related papers (2023-10-10T21:32:15Z) - Channelformer: Attention based Neural Solution for Wireless Channel
Estimation and Effective Online Training [1.0499453838486013]
We propose an encoder-decoder neural architecture (called Channelformer) to achieve improved channel estimation.
We employ multi-head attention in the encoder and a residual convolutional neural architecture as the decoder.
We also propose an effective online training method based on the fifth generation (5G) new radio (NR) configuration for the modern communication systems.
arXiv Detail & Related papers (2023-02-08T23:18:23Z) - Ultrasound Signal Processing: From Models to Deep Learning [64.56774869055826]
Medical ultrasound imaging relies heavily on high-quality signal processing to provide reliable and interpretable image reconstructions.
Deep learning based methods, which are optimized in a data-driven fashion, have gained popularity.
A relatively new paradigm combines the power of the two: leveraging data-driven deep learning, as well as exploiting domain knowledge.
arXiv Detail & Related papers (2022-04-09T13:04:36Z) - Neural Data-Dependent Transform for Learned Image Compression [72.86505042102155]
We build a neural data-dependent transform and introduce a continuous online mode decision mechanism to jointly optimize the coding efficiency for each individual image.
The experimental results show the effectiveness of the proposed neural-syntax design and the continuous online mode decision mechanism.
arXiv Detail & Related papers (2022-03-09T14:56:48Z) - Learn to Communicate with Neural Calibration: Scalability and
Generalization [10.775558382613077]
We propose a scalable and generalizable neural calibration framework for future wireless system design.
The proposed neural calibration framework is applied to solve challenging resource management problems in massive multiple-input multiple-output (MIMO) systems.
arXiv Detail & Related papers (2021-10-01T09:00:25Z) - Supervised DKRC with Images for Offline System Identification [77.34726150561087]
Modern dynamical systems are becoming increasingly non-linear and complex.
There is a need for a framework to model these systems in a compact and comprehensive representation for prediction and control.
Our approach learns these basis functions using a supervised learning approach.
arXiv Detail & Related papers (2021-09-06T04:39:06Z) - Model-Based Deep Learning [155.063817656602]
Signal processing, communications, and control have traditionally relied on classical statistical modeling techniques.
Deep neural networks (DNNs) use generic architectures which learn to operate from data, and demonstrate excellent performance.
We are interested in hybrid techniques that combine principled mathematical models with data-driven systems to benefit from the advantages of both approaches.
arXiv Detail & Related papers (2020-12-15T16:29:49Z) - AutoSpeech: Neural Architecture Search for Speaker Recognition [108.69505815793028]
We propose the first neural architecture search approach approach for the speaker recognition tasks, named as AutoSpeech.
Our algorithm first identifies the optimal operation combination in a neural cell and then derives a CNN model by stacking the neural cell for multiple times.
Results demonstrate that the derived CNN architectures significantly outperform current speaker recognition systems based on VGG-M, ResNet-18, and ResNet-34 back-bones, while enjoying lower model complexity.
arXiv Detail & Related papers (2020-05-07T02:53:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.