HARP-Net: Hyper-Autoencoded Reconstruction Propagation for Scalable
Neural Audio Coding
- URL: http://arxiv.org/abs/2107.10843v2
- Date: Fri, 23 Jul 2021 14:33:04 GMT
- Title: HARP-Net: Hyper-Autoencoded Reconstruction Propagation for Scalable
Neural Audio Coding
- Authors: Darius Petermann, Seungkwon Beack, Minje Kim
- Abstract summary: An autoencoder-based decoder employs quantization to turn its bottleneck layer activation into bitstrings.
To circumvent this issue, we employ additional skip connections between the corresponding pair of encoder-decoder layers.
We empirically verify that the proposed hyper-autoencoded architecture improves audio quality compared to an ordinary autoencoder baseline.
- Score: 25.51661602383911
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: An autoencoder-based codec employs quantization to turn its bottleneck layer
activation into bitstrings, a process that hinders information flow between the
encoder and decoder parts. To circumvent this issue, we employ additional skip
connections between the corresponding pair of encoder-decoder layers. The
assumption is that, in a mirrored autoencoder topology, a decoder layer
reconstructs the intermediate feature representation of its corresponding
encoder layer. Hence, any additional information directly propagated from the
corresponding encoder layer helps the reconstruction. We implement this kind of
skip connections in the form of additional autoencoders, each of which is a
small codec that compresses the massive data transfer between the paired
encoder-decoder layers. We empirically verify that the proposed
hyper-autoencoded architecture improves perceptual audio quality compared to an
ordinary autoencoder baseline.
Related papers
- $ε$-VAE: Denoising as Visual Decoding [61.29255979767292]
In generative modeling, tokenization simplifies complex data into compact, structured representations, creating a more efficient, learnable space.
Current visual tokenization methods rely on a traditional autoencoder framework, where the encoder compresses data into latent representations, and the decoder reconstructs the original input.
We propose denoising as decoding, shifting from single-step reconstruction to iterative refinement. Specifically, we replace the decoder with a diffusion process that iteratively refines noise to recover the original image, guided by the latents provided by the encoder.
We evaluate our approach by assessing both reconstruction (rFID) and generation quality (
arXiv Detail & Related papers (2024-10-05T08:27:53Z) - More complex encoder is not all you need [0.882348769487259]
We introduce neU-Net (i.e., not complex encoder U-Net), which incorporates a novel Sub-pixel Convolution for upsampling to construct a powerful decoder.
Our model design achieves excellent results, surpassing other state-of-the-art methods on both the Synapse and ACDC datasets.
arXiv Detail & Related papers (2023-09-20T08:34:38Z) - Decoder-Only or Encoder-Decoder? Interpreting Language Model as a
Regularized Encoder-Decoder [75.03283861464365]
The seq2seq task aims at generating the target sequence based on the given input source sequence.
Traditionally, most of the seq2seq task is resolved by an encoder to encode the source sequence and a decoder to generate the target text.
Recently, a bunch of new approaches have emerged that apply decoder-only language models directly to the seq2seq task.
arXiv Detail & Related papers (2023-04-08T15:44:29Z) - Rethinking Skip Connections in Encoder-decoder Networks for Monocular
Depth Estimation [4.364863910305258]
We propose a full skip connection network (FSCN) for monocular depth estimation task.
In addition, to fuse features within skip connections more closely, we present an adaptive concatenation module (ACM)
arXiv Detail & Related papers (2022-08-29T09:20:53Z) - SoftPool++: An Encoder-Decoder Network for Point Cloud Completion [93.54286830844134]
We propose a novel convolutional operator for the task of point cloud completion.
The proposed operator does not require any max-pooling or voxelization operation.
We show that our approach achieves state-of-the-art performance in shape completion at low and high resolutions.
arXiv Detail & Related papers (2022-05-08T15:31:36Z) - Dynamic Neural Representational Decoders for High-Resolution Semantic
Segmentation [98.05643473345474]
We propose a novel decoder, termed dynamic neural representational decoder (NRD)
As each location on the encoder's output corresponds to a local patch of the semantic labels, in this work, we represent these local patches of labels with compact neural networks.
This neural representation enables our decoder to leverage the smoothness prior in the semantic label space, and thus makes our decoder more efficient.
arXiv Detail & Related papers (2021-07-30T04:50:56Z) - Cascade Decoders-Based Autoencoders for Image Reconstruction [2.924868086534434]
This paper aims for image reconstruction of autoencoders, employs cascade decoders-based autoencoders.
The proposed serial decoders-based autoencoders include the architectures of multi-level decoders and the related optimization algorithms.
It is evaluated by the experimental results that the proposed autoencoders outperform the classical autoencoders in the performance of image reconstruction.
arXiv Detail & Related papers (2021-06-29T23:40:54Z) - Beyond Single Stage Encoder-Decoder Networks: Deep Decoders for Semantic
Image Segmentation [56.44853893149365]
Single encoder-decoder methodologies for semantic segmentation are reaching their peak in terms of segmentation quality and efficiency per number of layers.
We propose a new architecture based on a decoder which uses a set of shallow networks for capturing more information content.
In order to further improve the architecture we introduce a weight function which aims to re-balance classes to increase the attention of the networks to under-represented objects.
arXiv Detail & Related papers (2020-07-19T18:44:34Z) - Rethinking and Improving Natural Language Generation with Layer-Wise
Multi-View Decoding [59.48857453699463]
In sequence-to-sequence learning, the decoder relies on the attention mechanism to efficiently extract information from the encoder.
Recent work has proposed to use representations from different encoder layers for diversified levels of information.
We propose layer-wise multi-view decoding, where for each decoder layer, together with the representations from the last encoder layer, which serve as a global view, those from other encoder layers are supplemented for a stereoscopic view of the source sequences.
arXiv Detail & Related papers (2020-05-16T20:00:39Z) - Balancing Cost and Benefit with Tied-Multi Transformers [24.70761584719857]
In sequence-to-sequence modeling, the output of the last layer of the N-layer encoder is fed to the M-layer decoder, and the output of the last decoder layer is used to compute loss.
Our method computes a single loss consisting of NxM losses, where each loss is computed from the output of one of the M decoder layers connected to one of the N encoder layers.
Such a model subsumes NxM models with different number of encoder and decoder layers, and can be used for decoding with fewer than the maximum number of encoder and decoder layers.
arXiv Detail & Related papers (2020-02-20T08:20:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.