Related papers: Deep Neural Networks and End-to-End Learning for Audio Compression

Deep Neural Networks and End-to-End Learning for Audio Compression

URL: http://arxiv.org/abs/2105.11681v1
Date: Tue, 25 May 2021 05:36:30 GMT
Title: Deep Neural Networks and End-to-End Learning for Audio Compression
Authors: Daniela N. Rim, Inseon Jang, Heeyoul Choi
Abstract summary: We present an end-to-end deep learning approach that combines recurrent neural networks (RNNs) within the training strategy of variational autoencoders (VAEs) with a binary representation of the latent space. This is the first end-to-end learning for a single audio compression model with RNNs, and our model achieves a Signal to Distortion Ratio (SDR) of 20.54.
Score: 2.084078990567849
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent achievements in end-to-end deep learning have encouraged the exploration of tasks dealing with highly structured data with unified deep network models. Having such models for compressing audio signals has been challenging since it requires discrete representations that are not easy to train with end-to-end backpropagation. In this paper, we present an end-to-end deep learning approach that combines recurrent neural networks (RNNs) within the training strategy of variational autoencoders (VAEs) with a binary representation of the latent space. We apply a reparametrization trick for the Bernoulli distribution for the discrete representations, which allows smooth backpropagation. In addition, our approach allows the separation of the encoder and decoder, which is necessary for compression tasks. To our best knowledge, this is the first end-to-end learning for a single audio compression model with RNNs, and our model achieves a Signal to Distortion Ratio (SDR) of 20.54.

Related papers

Tuning the Frequencies: Robust Training for Sinusoidal Neural Networks [1.5124439914522694]
We introduce a theoretical framework that explains the capacity property of sinusoidal networks. We show how its layer compositions produce a large number of new frequencies expressed as integer combinations of the input frequencies. Our method, referred to as TUNER, greatly improves the stability and convergence of sinusoidal INR training, leading to detailed reconstructions.
arXiv Detail & Related papers (2024-07-30T18:24:46Z)
"Lossless" Compression of Deep Neural Networks: A High-dimensional Neural Tangent Kernel Approach [49.744093838327615]
We provide a novel compression approach to wide and fully-connected emphdeep neural nets. Experiments on both synthetic and real-world data are conducted to support the advantages of the proposed compression scheme.
arXiv Detail & Related papers (2024-03-01T03:46:28Z)
Dynamic Encoding and Decoding of Information for Split Learning in Mobile-Edge Computing: Leveraging Information Bottleneck Theory [1.1151919978983582]
Split learning is a privacy-preserving distributed learning paradigm in which an ML model is split into two parts (i.e., an encoder and a decoder) In mobile-edge computing, network functions can be trained via split learning where an encoder resides in a user equipment (UE) and a decoder resides in the edge network. We present a new framework and training mechanism to enable a dynamic balancing of the transmission resource consumption with the informativeness of the shared latent representations.
arXiv Detail & Related papers (2023-09-06T07:04:37Z)
Progressive Fourier Neural Representation for Sequential Video Compilation [75.43041679717376]
Motivated by continual learning, this work investigates how to accumulate and transfer neural implicit representations for multiple complex video data over sequential encoding sessions. We propose a novel method, Progressive Fourier Neural Representation (PFNR), that aims to find an adaptive and compact sub-module in Fourier space to encode videos in each training session. We validate our PFNR method on the UVG8/17 and DAVIS50 video sequence benchmarks and achieve impressive performance gains over strong continual learning baselines.
arXiv Detail & Related papers (2023-06-20T06:02:19Z)
High Fidelity Neural Audio Compression [92.4812002532009]
We introduce a state-of-the-art real-time, high-fidelity, audio leveraging neural networks. It consists in a streaming encoder-decoder architecture with quantized latent space trained in an end-to-end fashion. We simplify and speed-up the training by using a single multiscale spectrogram adversary.
arXiv Detail & Related papers (2022-10-24T17:52:02Z)
A Low-Complexity Approach to Rate-Distortion Optimized Variable Bit-Rate Compression for Split DNN Computing [5.3221129103999125]
Split computing has emerged as a recent paradigm for implementation of DNN-based AI workloads. We present an approach that addresses the challenge of optimizing the rate-accuracy-complexity trade-off. Our approach is remarkably lightweight, both during training and inference, highly effective and achieves excellent rate-distortion performance.
arXiv Detail & Related papers (2022-08-24T15:02:11Z)
Wideband and Entropy-Aware Deep Soft Bit Quantization [1.7259824817932292]
We introduce a novel deep learning solution for soft bit quantization across wideband channels. Our method is trained end-to-end with quantization- and entropy-aware augmentations to the loss function. Our method achieves a compression gain of up to $10 %$ in the high SNR regime versus previous state-of-the-art methods.
arXiv Detail & Related papers (2021-10-18T18:00:05Z)
Self-supervised Neural Networks for Spectral Snapshot Compressive Imaging [15.616674529295366]
We consider using untrained neural networks to solve the reconstruction problem of snapshot compressive imaging (SCI) In this paper, inspired by the untrained neural networks such as deep image priors (DIP) and deep decoders, we develop a framework by integrating DIP into the plug-and-play regime, leading to a self-supervised network for spectral SCI reconstruction.
arXiv Detail & Related papers (2021-08-28T14:17:38Z)
Self-supervised Audiovisual Representation Learning for Remote Sensing Data [96.23611272637943]
We propose a self-supervised approach for pre-training deep neural networks in remote sensing. By exploiting the correspondence between geo-tagged audio recordings and remote sensing, this is done in a completely label-free manner. We show that our approach outperforms existing pre-training strategies for remote sensing imagery.
arXiv Detail & Related papers (2021-08-02T07:50:50Z)
Unfolding Neural Networks for Compressive Multichannel Blind Deconvolution [71.29848468762789]
We propose a learned-structured unfolding neural network for the problem of compressive sparse multichannel blind-deconvolution. In this problem, each channel's measurements are given as convolution of a common source signal and sparse filter. We demonstrate that our method is superior to classical structured compressive sparse multichannel blind-deconvolution methods in terms of accuracy and speed of sparse filter recovery.
arXiv Detail & Related papers (2020-10-22T02:34:33Z)
Bayesian Sparsification Methods for Deep Complex-valued Networks [18.00411355850543]
We extend Sparse Variational Dropout to complex-valued neural networks. We conduct a large numerical study of the performance-compression trade-off of C-valued networks on two tasks. We replicate the state-of-the-art result by Trabelsi et al. [ 2018] on MusicNet with a complex-valued network compressed by 50-100x at a small performance penalty.
arXiv Detail & Related papers (2020-03-25T13:57:16Z)
Belief Propagation Reloaded: Learning BP-Layers for Labeling Problems [83.98774574197613]
We take one of the simplest inference methods, a truncated max-product Belief propagation, and add what is necessary to make it a proper component of a deep learning model. This BP-Layer can be used as the final or an intermediate block in convolutional neural networks (CNNs) The model is applicable to a range of dense prediction problems, is well-trainable and provides parameter-efficient and robust solutions in stereo, optical flow and semantic segmentation.
arXiv Detail & Related papers (2020-03-13T13:11:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.