Conformer LLMs -- Convolution Augmented Large Language Models
- URL: http://arxiv.org/abs/2307.00461v1
- Date: Sun, 2 Jul 2023 03:05:41 GMT
- Title: Conformer LLMs -- Convolution Augmented Large Language Models
- Authors: Prateek Verma
- Abstract summary: This work builds together two popular blocks of neural architecture, namely convolutional layers and Transformers, for large language models (LLMs)
Transformers decoders effectively capture long-range dependencies over several modalities and form a core backbone of modern advancements in machine learning.
This work showcases a robust speech architecture that can be integrated and adapted in a causal setup beyond speech applications for large-scale language modeling.
- Score: 2.8935588665357077
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work builds together two popular blocks of neural architecture, namely
convolutional layers and Transformers, for large language models (LLMs).
Non-causal conformers are used ubiquitously in automatic speech recognition.
This work aims to adapt these architectures in a causal setup for training
LLMs. Transformers decoders effectively capture long-range dependencies over
several modalities and form a core backbone of modern advancements in machine
learning. Convolutional architectures have been popular in extracting features
in domains such as raw 1-D signals, speech, and images, to name a few. In this
paper, by combining local and global dependencies over latent representations
using causal convolutional filters and Transformer, we achieve significant
gains in performance. This work showcases a robust speech architecture that can
be integrated and adapted in a causal setup beyond speech applications for
large-scale language modeling.
Related papers
- Scalable, Tokenization-Free Diffusion Model Architectures with Efficient Initial Convolution and Fixed-Size Reusable Structures for On-Device Image Generation [0.0]
Vision Transformers and U-Net architectures have been widely adopted in the implementation of Diffusion Models.
We propose an architecture that utilizes a fixed-size, reusable transformer block as a core structure.
Our architecture is characterized by low complexity, token-free design, absence of positional embeddings, uniformity, and scalability.
arXiv Detail & Related papers (2024-11-09T08:58:57Z) - ELICIT: LLM Augmentation via External In-Context Capability [16.237679215248196]
alg is a framework consisting of two modules designed to effectively store and reuse task vectors.
alg serves as a plug-and-play performance booster to enable adaptive elicitation of model capabilities.
arXiv Detail & Related papers (2024-10-12T03:19:06Z) - IAA: Inner-Adaptor Architecture Empowers Frozen Large Language Model with Multimodal Capabilities [4.269326314400742]
We introduce the Inner-Adaptor Architecture for multimodal large language models (MLLMs)
The architecture incorporates multiple multimodal adaptors at varying depths within the large language model to facilitate direct interaction with the inherently text-oriented transformer layers.
Unlike previous approaches of freezing language models that require large-scale aligned data, our proposed architecture is able to achieve superior performance on small-scale datasets.
arXiv Detail & Related papers (2024-08-23T08:10:13Z) - GiT: Towards Generalist Vision Transformer through Universal Language Interface [94.33443158125186]
This paper proposes a simple, yet effective framework, called GiT, simultaneously applicable for various vision tasks only with a vanilla ViT.
GiT is a multi-task visual model, jointly trained across five representative benchmarks without task-specific fine-tuning.
arXiv Detail & Related papers (2024-03-14T13:47:41Z) - On decoder-only architecture for speech-to-text and large language model
integration [59.49886892602309]
Speech-LLaMA is a novel approach that effectively incorporates acoustic information into text-based large language models.
We conduct experiments on multilingual speech-to-text translation tasks and demonstrate a significant improvement over strong baselines.
arXiv Detail & Related papers (2023-07-08T06:47:58Z) - Efficient Spoken Language Recognition via Multilabel Classification [53.662747523872305]
We show that our models obtain competitive results while being orders of magnitude smaller and faster than current state-of-the-art methods.
Our multilabel strategy is more robust to unseen non-target languages compared to multiclass classification.
arXiv Detail & Related papers (2023-06-02T23:04:19Z) - Branchformer: Parallel MLP-Attention Architectures to Capture Local and
Global Context for Speech Recognition and Understanding [41.928263518867816]
Conformer has proven to be effective in many speech processing tasks.
Inspired by this, we propose a more flexible, interpretable and customizable encoder alternative, Branchformer.
arXiv Detail & Related papers (2022-07-06T21:08:10Z) - Language Models are General-Purpose Interfaces [109.45478241369655]
We propose to use language models as a general-purpose interface to various foundation models.
A collection of pretrained encoders perceive diverse modalities (such as vision, and language)
We propose a semi-causal language modeling objective to jointly pretrain the interface and the modular encoders.
arXiv Detail & Related papers (2022-06-13T17:34:22Z) - Examining Scaling and Transfer of Language Model Architectures for
Machine Translation [51.69212730675345]
Language models (LMs) process sequences in a single stack of layers, and encoder-decoder models (EncDec) utilize separate layer stacks for input and output processing.
In machine translation, EncDec has long been the favoured approach, but with few studies investigating the performance of LMs.
arXiv Detail & Related papers (2022-02-01T16:20:15Z) - GroupBERT: Enhanced Transformer Architecture with Efficient Grouped
Structures [57.46093180685175]
We demonstrate a set of modifications to the structure of a Transformer layer, producing a more efficient architecture.
We add a convolutional module to complement the self-attention module, decoupling the learning of local and global interactions.
We apply the resulting architecture to language representation learning and demonstrate its superior performance compared to BERT models of different scales.
arXiv Detail & Related papers (2021-06-10T15:41:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.