Related papers: A Language Model With Million Sample Context For Raw Audio Using Transformer Architectures

A Language Model With Million Sample Context For Raw Audio Using Transformer Architectures

URL: http://arxiv.org/abs/2206.08297v2
Date: Tue, 16 May 2023 20:50:56 GMT
Title: A Language Model With Million Sample Context For Raw Audio Using Transformer Architectures
Authors: Prateek Verma
Abstract summary: We propose a generative auto-regressive architecture that can model audio waveforms over a large context. Our work is adapted to learn time dependencies by learning a latent representation by a CNN front-end, and then learning dependencies over these representations using Transformer encoders. We achieve a state-of-the-art performance as compared to other approaches such as Wavenet, SaSHMI, and Sample-RNN.
Score: 2.8935588665357077
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Modeling long-term dependencies for audio signals is a particularly challenging problem, as even small-time scales yield on the order of a hundred thousand samples. With the recent advent of Transformers, neural architectures became good at modeling dependencies over longer time scales, but they suffered from quadratic constraints to scale them. We propose a generative auto-regressive architecture that can model audio waveforms over quite a large context, greater than 500,000 samples. Our work is adapted to learn time dependencies by learning a latent representation by a CNN front-end, and then learning dependencies over these representations using Transformer encoders, fully trained end-to-end: thereby allowing to learn representations as it deems fit for the next sample. Unlike previous works that compared different time scales to show improvement, we use a standard dataset, with the same number of parameters/context to show improvements. We achieve a state-of-the-art performance as compared to other approaches such as Wavenet, SaSHMI, and Sample-RNN on a standard dataset for modeling long-term structure. This work gives very exciting direction for the field, given improvements in context modeling that can be scaled with more data, as well as potentially better results by using billions/trillions of parameters.

Related papers

Transferable Post-training via Inverse Value Learning [83.75002867411263]
We propose modeling changes at the logits level during post-training using a separate neural network (i.e., the value network) After training this network on a small base model using demonstrations, this network can be seamlessly integrated with other pre-trained models during inference. We demonstrate that the resulting value network has broad transferability across pre-trained models of different parameter sizes.
arXiv Detail & Related papers (2024-10-28T13:48:43Z)
Minimum Tuning to Unlock Long Output from LLMs with High Quality Data as the Key [3.3339400603549265]
We show that it is possible to achieve notable performance improvement in tuned models with a small fraction of training data instances and compute. Our findings suggest that, while capacities for generating long output vary across different models out-of-the-box, our approach to tune them with high-quality data using lite compute, consistently yields notable improvement across all models we experimented on.
arXiv Detail & Related papers (2024-10-14T07:09:02Z)
Exploring the design space of deep-learning-based weather forecasting systems [56.129148006412855]
This paper systematically analyzes the impact of different design choices on deep-learning-based weather forecasting systems. We study fixed-grid architectures such as UNet, fully convolutional architectures, and transformer-based models. We propose a hybrid system that combines the strong performance of fixed-grid models with the flexibility of grid-invariant architectures.
arXiv Detail & Related papers (2024-10-09T22:25:50Z)
Timer: Generative Pre-trained Transformers Are Large Time Series Models [83.03091523806668]
This paper aims at the early development of large time series models (LTSM) During pre-training, we curate large-scale datasets with up to 1 billion time points. To meet diverse application needs, we convert forecasting, imputation, and anomaly detection of time series into a unified generative task.
arXiv Detail & Related papers (2024-02-04T06:55:55Z)
Generative Pre-training for Speech with Flow Matching [81.59952572752248]
We pre-trained a generative model, named SpeechFlow, on 60k hours of untranscribed speech with Flow Matching and masked conditions. Experiment results show the pre-trained generative model can be fine-tuned with task-specific data to match or surpass existing expert models on speech enhancement, separation, and synthesis.
arXiv Detail & Related papers (2023-10-25T03:40:50Z)
Pushing the Limits of Pre-training for Time Series Forecasting in the CloudOps Domain [54.67888148566323]
We introduce three large-scale time series forecasting datasets from the cloud operations domain. We show it is a strong zero-shot baseline and benefits from further scaling, both in model and dataset size. Accompanying these datasets and results is a suite of comprehensive benchmark results comparing classical and deep learning baselines to our pre-trained method.
arXiv Detail & Related papers (2023-10-08T08:09:51Z)
A Unified View of Long-Sequence Models towards Modeling Million-Scale Dependencies [0.0]
We compare existing solutions to long-sequence modeling in terms of their pure mathematical formulation. We then demonstrate that long context length does yield better performance, albeit application-dependent. Inspired by emerging sparse models of huge capacity, we propose a machine learning system for handling million-scale dependencies.
arXiv Detail & Related papers (2023-02-13T09:47:31Z)
Generative time series models using Neural ODE in Variational Autoencoders [0.0]
We implement Neural Ordinary Differential Equations in a Variational Autoencoder setting for generative time series modeling. An object-oriented approach to the code was taken to allow for easier development and research.
arXiv Detail & Related papers (2022-01-12T14:38:11Z)
Comparing Test Sets with Item Response Theory [53.755064720563]
We evaluate 29 datasets using predictions from 18 pretrained Transformer models on individual test examples. We find that Quoref, HellaSwag, and MC-TACO are best suited for distinguishing among state-of-the-art models. We also observe span selection task format, which is used for QA datasets like QAMR or SQuAD2.0, is effective in differentiating between strong and weak models.
arXiv Detail & Related papers (2021-06-01T22:33:53Z)
Long-Span Dependencies in Transformer-based Summarization Systems [38.672160430296536]
Transformer-based models have achieved state-of-the-art results in a wide range of natural language processing (NLP) tasks including document summarization. One issue with these transformer-based models is that they do not scale well in terms of memory and compute requirements as the input length grows. In this work, we exploit large pre-trained transformer-based models and address long-span dependencies in abstractive summarization.
arXiv Detail & Related papers (2021-05-08T23:53:03Z)
Audio Transformers:Transformer Architectures For Large Scale Audio Understanding. Adieu Convolutions [6.370905925442655]
We propose applying Transformer based architectures without convolutional layers to raw audio signals. Our model outperforms convolutional models to produce state of the art results. We further improve the performance of Transformer architectures by using techniques such as pooling inspired from convolutional net-work.
arXiv Detail & Related papers (2021-05-01T19:38:30Z)
TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech [63.03318307254081]
TERA stands for Transformer Representations from Alteration. We use alteration along three axes to pre-train Transformers on a large amount of unlabeled speech. TERA can be used for speech representations extraction or fine-tuning with downstream models.
arXiv Detail & Related papers (2020-07-12T16:19:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.