Related papers: Learning Linearity in Audio Consistency Autoencoders via Implicit Regularization

Learning Linearity in Audio Consistency Autoencoders via Implicit Regularization

URL: http://arxiv.org/abs/2510.23530v1
Date: Mon, 27 Oct 2025 17:08:27 GMT
Title: Learning Linearity in Audio Consistency Autoencoders via Implicit Regularization
Authors: Bernardo Torres, Manuel Moussallam, Gabriel Meseguer-Brocal,
Abstract summary: We introduce a simple training methodology to induce linearity in a high-compression Consistency Autoencoder.<n>The CAE exhibits linear behavior in both the encoder and decoder while preserving reconstruction fidelity.
Score: 6.551534586860428
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Audio autoencoders learn useful, compressed audio representations, but their non-linear latent spaces prevent intuitive algebraic manipulation such as mixing or scaling. We introduce a simple training methodology to induce linearity in a high-compression Consistency Autoencoder (CAE) by using data augmentation, thereby inducing homogeneity (equivariance to scalar gain) and additivity (the decoder preserves addition) without altering the model's architecture or loss function. When trained with our method, the CAE exhibits linear behavior in both the encoder and decoder while preserving reconstruction fidelity. We test the practical utility of our learned space on music source composition and separation via simple latent arithmetic. This work presents a straightforward technique for constructing structured latent spaces, enabling more intuitive and efficient audio processing.

Related papers

Representation-Regularized Convolutional Audio Transformer for Audio Understanding [53.092757178419355]
bootstrapping representations from scratch is computationally expensive, often requiring extensive training to converge.<n>We propose the Convolutional Audio Transformer (CAT), a unified framework designed to address these challenges.
arXiv Detail & Related papers (2026-01-29T12:16:19Z)
Learning to Upsample and Upmix Audio in the Latent Domain [14.777092647088756]
Neural audio autoencoders create compact latent representations that preserve perceptually important information.<n>We propose a framework that performs audio processing operations entirely within an autoencoder's latent space.<n>We demonstrate computational efficiency gains of up to 100x while maintaining quality comparable to post-processing on raw audio.
arXiv Detail & Related papers (2025-05-31T19:27:22Z)
Epsilon-VAE: Denoising as Visual Decoding [61.29255979767292]
We propose denoising as decoding, shifting from single-step reconstruction to iterative refinement.<n>Specifically, we replace the decoder with a diffusion process that iteratively refines noise to recover the original image.<n>By adopting iterative reconstruction through diffusion, our autoencoder, namely Epsilon-VAE, achieves high reconstruction quality.
arXiv Detail & Related papers (2024-10-05T08:27:53Z)
Hold Me Tight: Stable Encoder-Decoder Design for Speech Enhancement [1.4037575966075835]
1-D filters on raw audio are hard to train and often suffer from instabilities. We address these problems with hybrid solutions, combining theory-driven and data-driven approaches.
arXiv Detail & Related papers (2024-08-30T15:49:31Z)
Fundamental Limits of Two-layer Autoencoders, and Achieving Them with Gradient Methods [91.54785981649228]
This paper focuses on non-linear two-layer autoencoders trained in the challenging proportional regime. Our results characterize the minimizers of the population risk, and show that such minimizers are achieved by gradient methods. For the special case of a sign activation function, our analysis establishes the fundamental limits for the lossy compression of Gaussian sources via (shallow) autoencoders.
arXiv Detail & Related papers (2022-12-27T12:37:34Z)
The dynamics of representation learning in shallow, non-linear autoencoders [3.1219977244201056]
We study the dynamics of feature learning in non-linear, shallow autoencoders. An analysis of the long-time dynamics explains the failure of sigmoidal autoencoders to learn with tied weights. We show that our equations accurately describe the generalisation dynamics of non-linear autoencoders on realistic datasets.
arXiv Detail & Related papers (2022-01-06T15:57:31Z)
Diffusion-Based Representation Learning [65.55681678004038]
We augment the denoising score matching framework to enable representation learning without any supervised signal. In contrast, the introduced diffusion-based representation learning relies on a new formulation of the denoising score matching objective. Using the same approach, we propose to learn an infinite-dimensional latent code that achieves improvements of state-of-the-art models on semi-supervised image classification.
arXiv Detail & Related papers (2021-05-29T09:26:02Z)
Autoencoding Variational Autoencoder [56.05008520271406]
We study the implications of this behaviour on the learned representations and also the consequences of fixing it by introducing a notion of self consistency. We show that encoders trained with our self-consistency approach lead to representations that are robust (insensitive) to perturbations in the input introduced by adversarial attacks.
arXiv Detail & Related papers (2020-12-07T14:16:14Z)
A New Modal Autoencoder for Functionally Independent Feature Extraction [6.690183908967779]
A new modal autoencoder (MAE) is proposed by othogonalising the columns of the readout weight matrix. The results were validated on the MNIST variations and USPS classification benchmark suite. The new MAE introduces a very simple training principle for autoencoders and could be promising for the pre-training of deep neural networks.
arXiv Detail & Related papers (2020-06-25T13:25:10Z)
MetaSDF: Meta-learning Signed Distance Functions [85.81290552559817]
Generalizing across shapes with neural implicit representations amounts to learning priors over the respective function space. We formalize learning of a shape space as a meta-learning problem and leverage gradient-based meta-learning algorithms to solve this task.
arXiv Detail & Related papers (2020-06-17T05:14:53Z)
Improving auditory attention decoding performance of linear and non-linear methods using state-space model [21.40315235087551]
Recent advances in electroencephalography have shown that it is possible to identify the target speaker from single-trial EEG recordings. AAD methods reconstruct the attended speech envelope from EEG recordings, based on a linear least-squares cost function or non-linear neural networks. We investigate a state-space model using correlation coefficients obtained with a small correlation window to improve the decoding performance.
arXiv Detail & Related papers (2020-04-02T09:56:06Z)
Learning Autoencoders with Relational Regularization [89.53065887608088]
A new framework is proposed for learning autoencoders of data distributions. We minimize the discrepancy between the model and target distributions, with a emphrelational regularization We implement the framework with two scalable algorithms, making it applicable for both probabilistic and deterministic autoencoders.
arXiv Detail & Related papers (2020-02-07T17:27:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.