Adversarial Audio Synthesis with Complex-valued Polynomial Networks
- URL: http://arxiv.org/abs/2206.06811v1
- Date: Tue, 14 Jun 2022 12:58:59 GMT
- Title: Adversarial Audio Synthesis with Complex-valued Polynomial Networks
- Authors: Yongtao Wu, Grigorios G Chrysos, Volkan Cevher
- Abstract summary: Time-frequency (TF) representations in audio have been increasingly modeled real-valued networks.
We introduce complex-valued networks called APOLLO, that integrate such complex-valued representations in a natural way.
APOLLO results in $17.5%$ improvement over adversarial methods and $8.2%$ over the state-of-the-art diffusion models on SC09 in audio generation.
- Score: 60.231877895663956
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Time-frequency (TF) representations in audio synthesis have been increasingly
modeled with real-valued networks. However, overlooking the complex-valued
nature of TF representations can result in suboptimal performance and require
additional modules (e.g., for modeling the phase). To this end, we introduce
complex-valued polynomial networks, called APOLLO, that integrate such
complex-valued representations in a natural way. Concretely, APOLLO captures
high-order correlations of the input elements using high-order tensors as
scaling parameters. By leveraging standard tensor decompositions, we derive
different architectures and enable modeling richer correlations. We outline
such architectures and showcase their performance in audio generation across
four benchmarks. As a highlight, APOLLO results in $17.5\%$ improvement over
adversarial methods and $8.2\%$ over the state-of-the-art diffusion models on
SC09 dataset in audio generation. Our models can encourage the systematic
design of other efficient architectures on the complex field.
Related papers
- Tensor Polynomial Additive Model [40.30621617188693]
The TPAM preserves the inherent interpretability of additive models, transparent decision-making and the extraction of meaningful feature values.
It can enhance accuracy by up to 30%, and compression rate by up to 5 times, while maintaining a good interpretability.
arXiv Detail & Related papers (2024-06-05T06:23:11Z) - RAVEN: Rethinking Adversarial Video Generation with Efficient Tri-plane Networks [93.18404922542702]
We present a novel video generative model designed to address long-term spatial and temporal dependencies.
Our approach incorporates a hybrid explicit-implicit tri-plane representation inspired by 3D-aware generative frameworks.
Our model synthesizes high-fidelity video clips at a resolution of $256times256$ pixels, with durations extending to more than $5$ seconds at a frame rate of 30 fps.
arXiv Detail & Related papers (2024-01-11T16:48:44Z) - Adaptive re-calibration of channel-wise features for Adversarial Audio
Classification [0.0]
We propose a recalibration of features using attention feature fusion for synthetic speech detection.
We compare its performance against different detection methods including End2End models and Resnet-based models.
We also demonstrate that the combination of Linear frequency cepstral coefficients (LFCC) and Mel Frequency cepstral coefficients (MFCC) using the attentional feature fusion technique creates better input features representations.
arXiv Detail & Related papers (2022-10-21T04:21:56Z) - Squeezeformer: An Efficient Transformer for Automatic Speech Recognition [99.349598600887]
Conformer is the de facto backbone model for various downstream speech tasks based on its hybrid attention-convolution architecture.
We propose the Squeezeformer model, which consistently outperforms the state-of-the-art ASR models under the same training schemes.
arXiv Detail & Related papers (2022-06-02T06:06:29Z) - Multi-Scale Semantics-Guided Neural Networks for Efficient
Skeleton-Based Human Action Recognition [140.18376685167857]
A simple yet effective multi-scale semantics-guided neural network is proposed for skeleton-based action recognition.
MS-SGN achieves the state-of-the-art performance on the NTU60, NTU120, and SYSU datasets.
arXiv Detail & Related papers (2021-11-07T03:50:50Z) - MTCRNN: A multi-scale RNN for directed audio texture synthesis [0.0]
We introduce a novel modelling approach for textures, combining recurrent neural networks trained at different levels of abstraction with a conditioning strategy that allows for user-directed synthesis.
We demonstrate the model's performance on a variety of datasets, examine its performance on various metrics, and discuss some potential applications.
arXiv Detail & Related papers (2020-11-25T09:13:53Z) - High-Capacity Complex Convolutional Neural Networks For I/Q Modulation
Classification [0.0]
We claim state of the art performance by enabling high-capacity architectures containing residual and/or dense connections to compute complex-valued convolutions.
We show statistically significant improvements in all networks with complex convolutions for I/Q modulation classification.
arXiv Detail & Related papers (2020-10-21T02:26:24Z) - Revealing the Invisible with Model and Data Shrinking for
Composite-database Micro-expression Recognition [49.463864096615254]
We analyze the influence of learning complexity, including the input complexity and model complexity.
We propose a recurrent convolutional network (RCN) to explore the shallower-architecture and lower-resolution input data.
We develop three parameter-free modules to integrate with RCN without increasing any learnable parameters.
arXiv Detail & Related papers (2020-06-17T06:19:24Z) - Convolutional Tensor-Train LSTM for Spatio-temporal Learning [116.24172387469994]
We propose a higher-order LSTM model that can efficiently learn long-term correlations in the video sequence.
This is accomplished through a novel tensor train module that performs prediction by combining convolutional features across time.
Our results achieve state-of-the-art performance-art in a wide range of applications and datasets.
arXiv Detail & Related papers (2020-02-21T05:00:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.