WaveletGPT: Wavelets Meet Large Language Models
- URL: http://arxiv.org/abs/2409.12924v2
- Date: Thu, 3 Oct 2024 09:21:57 GMT
- Title: WaveletGPT: Wavelets Meet Large Language Models
- Authors: Prateek Verma,
- Abstract summary: Large Language Models (LLMs) have ushered in a new wave of artificial intelligence advancements.
This paper infuses LLMs with traditional signal processing ideas, namely wavelets, during pre-training to take advantage of the structure.
We achieve the same pre-training performance almost twice as fast in text, raw audio, and symbolic music.
- Score: 1.2328446298523066
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) have ushered in a new wave of artificial intelligence advancements impacting every scientific field and discipline. They are trained on a simple objective: to predict the next token given the previous context. We live in a world where most of the data around us, e.g., text, audio, and music, has a multi-scale structure associated with it. This paper infuses LLMs with traditional signal processing ideas, namely wavelets, during pre-training to take advantage of the structure. Without adding \textbf{any extra parameters} to a GPT-style LLM architecture, we achieve the same pre-training performance almost twice as fast in text, raw audio, and symbolic music. This is achieved by imposing a structure on intermediate embeddings. When trained for the same number of training steps, we achieve significant gains in performance, which is comparable to pre-training a larger neural architecture. Our architecture allows every next token prediction access to intermediate embeddings at different temporal resolutions in every Transformer decoder block. This work will hopefully pave the way for incorporating multi-rate signal processing ideas into traditional LLM pre-training. Further, we showcase pushing model performance by improving internal structure instead of just going after scale.
Related papers
- FIRP: Faster LLM inference via future intermediate representation prediction [54.897493351694195]
FIRP generates multiple tokens instead of one at each decoding step.
We conduct extensive experiments, showing a speedup ratio of 1.9x-3x in several models and datasets.
arXiv Detail & Related papers (2024-10-27T15:53:49Z) - Music Genre Classification using Large Language Models [50.750620612351284]
This paper exploits the zero-shot capabilities of pre-trained large language models (LLMs) for music genre classification.
The proposed approach splits audio signals into 20 ms chunks and processes them through convolutional feature encoders.
During inference, predictions on individual chunks are aggregated for a final genre classification.
arXiv Detail & Related papers (2024-10-10T19:17:56Z) - Adaptive Large Language Models By Layerwise Attention Shortcuts [46.76681147411957]
LLM-like setups allow the final layer to attend to all of the intermediate layers as it deems fit through the attention mechanism.
We showcase four different datasets, namely acoustic tokens, natural language, and symbolic music, and we achieve superior performance for GPT-like architecture.
arXiv Detail & Related papers (2024-09-17T03:46:01Z) - Building on Efficient Foundations: Effectively Training LLMs with Structured Feedforward Layers [16.253898272659242]
State-of-the-art results in large language models (LLMs) often rely on scale, which becomes computationally expensive.
Our study focuses on transformer-based LLMs, specifically targeting the computationally intensive feedforward networks (FFNs)
We show that wide and structured networks can utilize training FLOPs more efficiently, with fewer parameters and lower loss than dense models at their optimal trade-off.
arXiv Detail & Related papers (2024-06-24T08:43:21Z) - Towards Signal Processing In Large Language Models [46.76681147411957]
This paper introduces the idea of applying signal processing inside a Large Language Model (LLM)
We draw parallels between classical Fourier-Transforms and Fourier Transform-like learnable time-frequency representations.
We show that for GPT-like architectures, our work achieves faster convergence and significantly increases performance.
arXiv Detail & Related papers (2024-06-10T13:51:52Z) - eP-ALM: Efficient Perceptual Augmentation of Language Models [70.47962271121389]
We propose to direct effort to efficient adaptations of existing models, and propose to augment Language Models with perception.
Existing approaches for adapting pretrained models for vision-language tasks still rely on several key components that hinder their efficiency.
We show that by freezing more than 99% of total parameters, training only one linear projection layer, and prepending only one trainable token, our approach (dubbed eP-ALM) significantly outperforms other baselines on VQA and Captioning.
arXiv Detail & Related papers (2023-03-20T19:20:34Z) - Unifying Language Learning Paradigms [96.35981503087567]
We present a unified framework for pre-training models that are universally effective across datasets and setups.
We show how different pre-training objectives can be cast as one another and how interpolating between different objectives can be effective.
Our model also achieve strong results at in-context learning, outperforming 175B GPT-3 on zero-shot SuperGLUE and tripling the performance of T5-XXL on one-shot summarization.
arXiv Detail & Related papers (2022-05-10T19:32:20Z) - Simpler is Better: off-the-shelf Continual Learning Through Pretrained
Backbones [0.0]
We propose a baseline (off-the-shelf) for Continual Learning of Computer Vision problems.
We exploit the power of pretrained models to compute a class prototype and fill a memory bank.
We compare our pipeline with common CNN models and show the superiority of Vision Transformers.
arXiv Detail & Related papers (2022-05-03T16:03:46Z) - Large Scale Audio Understanding without Transformers/ Convolutions/
BERTs/ Mixers/ Attention/ RNNs or .... [4.594159253008448]
This paper presents a way of doing large scale audio understanding without traditional state of the art neural architectures.
Our approach does not have any convolutions, recurrence, attention, transformers or other approaches such as BERT.
A classification head (a feed-forward layer), similar to the approach in SimCLR is trained on a learned representation.
arXiv Detail & Related papers (2021-10-07T05:00:26Z) - Deep Imitation Learning for Bimanual Robotic Manipulation [70.56142804957187]
We present a deep imitation learning framework for robotic bimanual manipulation.
A core challenge is to generalize the manipulation skills to objects in different locations.
We propose to (i) decompose the multi-modal dynamics into elemental movement primitives, (ii) parameterize each primitive using a recurrent graph neural network to capture interactions, and (iii) integrate a high-level planner that composes primitives sequentially and a low-level controller to combine primitive dynamics and inverse kinematics control.
arXiv Detail & Related papers (2020-10-11T01:40:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.