Related papers: AdaVAE: Exploring Adaptive GPT-2s in Variational Auto-Encoders for Language Modeling

AdaVAE: Exploring Adaptive GPT-2s in Variational Auto-Encoders for Language Modeling

URL: http://arxiv.org/abs/2205.05862v1
Date: Thu, 12 May 2022 03:22:07 GMT
Title: AdaVAE: Exploring Adaptive GPT-2s in Variational Auto-Encoders for Language Modeling
Authors: Haoqin Tu, Zhongliang Yang, Jinshuai Yang, Siyu Zhang, Yongfeng Huang
Abstract summary: Variational Auto-Encoder (VAE) has become the de-facto learning paradigm in achieving both representation learning and generation for natural language. Existing VAE-based language models either employ elementary RNNs, or fine-tunes two pre-trained language models (PLMs) for any downstream task, which requires huge energy consumption. In this paper, we introduce the first VAE framework empowered with adaptive GPT-2s (AdaVAE)
Score: 33.18577107062907
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Variational Auto-Encoder (VAE) has become the de-facto learning paradigm in achieving both representation learning and generation for natural language. However, existing VAE-based language models either employ elementary RNNs, which is not powerful to handle multi-tasks, or fine-tunes two pre-trained language models (PLMs) for any downstream task, which requires huge energy consumption. In this paper, we introduce the first VAE framework empowered with adaptive GPT-2s (AdaVAE). Different from mentioned systems, we unify both the encoder and decoder of VAE model using GPT-2s with adaptive parameter-efficient components. Experiments from multiple dimensions validate that AdaVAE is competent to better organize language in generation and representation modeling, even with less than $15\%$ additionally activated parameters during training. Our code is available at \url{https://github.com/ImKeTT/adavae}.

Related papers

How to Learn a New Language? An Efficient Solution for Self-Supervised Learning Models Unseen Languages Adaption in Low-Resource Scenario [72.02391485962127]
Speech Self-Supervised Learning (SSL) models achieve impressive performance on Automatic Speech Recognition (ASR) In low-resource language ASR, they encounter the domain mismatch problem between pre-trained and low-resource languages. We extend a conventional efficient fine-tuning scheme based on the adapter to handle these issues.
arXiv Detail & Related papers (2024-11-27T10:51:00Z)
ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets [106.7760874400261]
This paper presents ML-SUPERB2.0, which is a new benchmark for evaluating pre-trained SSL and supervised speech models. We find performance improvements over the setup of ML-SUPERB, but performance depends on the downstream model design. Also, we find large performance differences between languages and datasets, suggesting the need for more targeted approaches.
arXiv Detail & Related papers (2024-06-12T21:01:26Z)
On Robustness of Finetuned Transformer-based NLP Models [11.063628128069736]
We characterize changes between pretrained and finetuned language model representations across layers using two metrics: CKA and STIR. GPT-2 representations are more robust than BERT and T5 across multiple types of input perturbations. This study provides valuable insights into perturbation-specific weaknesses of popular Transformer-based models.
arXiv Detail & Related papers (2023-05-23T18:25:18Z)
CHAPTER: Exploiting Convolutional Neural Network Adapters for Self-supervised Speech Models [62.60723685118747]
Self-supervised learning (SSL) is a powerful technique for learning representations from unlabeled data. We propose an efficient tuning method specifically designed for SSL speech model, by applying CNN adapters at the feature extractor. We empirically found that adding CNN to the feature extractor can help the adaptation on emotion and speaker tasks.
arXiv Detail & Related papers (2022-12-01T08:50:12Z)
Efficient Adapter Transfer of Self-Supervised Speech Models for Automatic Speech Recognition [0.1909808926064466]
Transformer based models such as wav2vec 2.0 and HuBERT are leading the field in the speech domain. We propose applying adapters to wav2vec 2.0 to reduce the number of parameters required for downstream ASR tasks.
arXiv Detail & Related papers (2022-02-07T14:20:54Z)
CoreLM: Coreference-aware Language Model Fine-Tuning [0.0]
We propose a Fine-Tuning framework, named CoreLM, that extends the architecture of current Pretrained Language Models. We make available information outside the contextual space of the model, which results in a better Language Model for a fraction of the computational cost. Our proposed model achieves a lower Perplexity in GUMBY and LAMBDADA datasets when compared to GPT2 and a fine-tuned version of GPT2 without any changes.
arXiv Detail & Related papers (2021-11-04T08:44:31Z)
Factorized Neural Transducer for Efficient Language Model Adaptation [51.81097243306204]
We propose a novel model, factorized neural Transducer, by factorizing the blank and vocabulary prediction. It is expected that this factorization can transfer the improvement of the standalone language model to the Transducer for speech recognition. We demonstrate that the proposed factorized neural Transducer yields 15% to 20% WER improvements when out-of-domain text data is used for language model adaptation.
arXiv Detail & Related papers (2021-09-27T15:04:00Z)
Dual-decoder Transformer for Joint Automatic Speech Recognition and Multilingual Speech Translation [71.54816893482457]
We introduce dual-decoder Transformer, a new model architecture that jointly performs automatic speech recognition (ASR) and multilingual speech translation (ST) Our models are based on the original Transformer architecture but consist of two decoders, each responsible for one task (ASR or ST)
arXiv Detail & Related papers (2020-11-02T04:59:50Z)
DiscreTalk: Text-to-Speech as a Machine Translation Problem [52.33785857500754]
This paper proposes a new end-to-end text-to-speech (E2E-TTS) model based on neural machine translation (NMT) The proposed model consists of two components; a non-autoregressive vector quantized variational autoencoder (VQ-VAE) model and an autoregressive Transformer-NMT model.
arXiv Detail & Related papers (2020-05-12T02:45:09Z)
Optimus: Organizing Sentences via Pre-trained Modeling of a Latent Space [109.79957125584252]
Variational Autoencoder (VAE) can be both a powerful generative model and an effective representation learning framework for natural language. In this paper, we propose the first large-scale language VAE model, Optimus.
arXiv Detail & Related papers (2020-04-05T06:20:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.