Related papers: Quantum linear algebra is all you need for Transformer architectures

Quantum linear algebra is all you need for Transformer architectures

URL: http://arxiv.org/abs/2402.16714v2
Date: Fri, 31 May 2024 03:34:57 GMT
Title: Quantum linear algebra is all you need for Transformer architectures
Authors: Naixu Guo, Zhan Yu, Matthew Choi, Aman Agrawal, Kouhei Nakaji, Alán Aspuru-Guzik, Patrick Rebentrost,
Abstract summary: We investigate transformer architectures under the lens of fault-tolerant quantum computing. We show how to prepare a block encoding of the self-attention matrix, with a new subroutine for the row-wise application of the softmax function. Our subroutines prepare an amplitude encoding of the transformer output, which can be measured to obtain a prediction.
Score: 1.660288273261283
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Generative machine learning methods such as large-language models are revolutionizing the creation of text and images. While these models are powerful they also harness a large amount of computational resources. The transformer is a key component in large language models that aims to generate a suitable completion of a given partial sequence. In this work, we investigate transformer architectures under the lens of fault-tolerant quantum computing. The input model is one where trained weight matrices are given as block encodings and we construct the query, key, and value matrices for the transformer. We show how to prepare a block encoding of the self-attention matrix, with a new subroutine for the row-wise application of the softmax function. In addition, we combine quantum subroutines to construct important building blocks in the transformer, the residual connection and layer normalization, and the feed-forward neural network. Our subroutines prepare an amplitude encoding of the transformer output, which can be measured to obtain a prediction. Based on common open-source large-language models, we provide insights into the behavior of important parameters determining the run time of the quantum algorithm. We discuss the potential and challenges for obtaining a quantum advantage.

Related papers

HQViT: Hybrid Quantum Vision Transformer for Image Classification [48.72766405978677]
We propose a Hybrid Quantum Vision Transformer (HQViT) to accelerate model training while enhancing model performance. HQViT introduces whole-image processing with amplitude encoding to better preserve global image information without additional positional encoding. Experiments across various computer vision datasets demonstrate that HQViT outperforms existing models, achieving a maximum improvement of up to $10.9%$ (on the MNIST 10-classification task) over the state of the art.
arXiv Detail & Related papers (2025-04-03T16:13:34Z)
A Hybrid Transformer Architecture with a Quantized Self-Attention Mechanism Applied to Molecular Generation [0.0]
We propose a hybrid quantum-classical self-attention mechanism as part of a transformer decoder. We show that the time complexity of the query-key dot product is reduced from $mathcalO(n2 d)$ in a classical model to $mathcalO(n2 d)$ in our quantum model. This work provides a promising avenue for quantum-enhanced natural language processing (NLP)
arXiv Detail & Related papers (2025-02-26T15:15:01Z)
Transformers are Efficient Compilers, Provably [11.459397066286822]
Transformer-based large language models (LLMs) have demonstrated surprisingly robust performance across a wide range of language-related tasks. In this paper, we take the first steps towards a formal investigation of using transformers as compilers from an expressive power perspective. We introduce a representative programming language, Mini-Husky, which encapsulates key features of modern C-like languages.
arXiv Detail & Related papers (2024-10-07T20:31:13Z)
Algorithmic Capabilities of Random Transformers [49.73113518329544]
We investigate what functions can be learned by randomly transformers in which only the embedding layers are optimized. We find that these random transformers can perform a wide range of meaningful algorithmic tasks. Our results indicate that some algorithmic capabilities are present in transformers even before these models are trained.
arXiv Detail & Related papers (2024-10-06T06:04:23Z)
Transformers meet Neural Algorithmic Reasoners [16.5785372289558]
We propose a novel approach that combines the Transformer's language understanding with the robustness of graph neural network (GNN)-based neural algorithmic reasoners (NARs) We evaluate our resulting TransNAR model on CLRS-Text, the text-based version of the CLRS-30 benchmark, and demonstrate significant gains over Transformer-only models for algorithmic reasoning.
arXiv Detail & Related papers (2024-06-13T16:42:06Z)
Quixer: A Quantum Transformer Model [3.140679149492808]
We present Quixer: a novel quantum transformer model. Quixer operates by preparing a superposition of tokens and applying a trainable non-linear transformation to this mix. We show that its parameterised components can be substituted with fixed structures to yield new classes of quantum transformers.
arXiv Detail & Related papers (2024-06-06T17:52:05Z)
Learning with SASQuaTCh: a Novel Variational Quantum Transformer Architecture with Kernel-Based Self-Attention [0.464982780843177]
We show that quantum circuits can efficiently express a self-attention mechanism through the perspective of kernel-based operator learning. In this work, we are able to represent deep layers of a vision transformer network using simple gate operations and a set of multi-dimensional quantum Fourier transforms. We analyze our novel variational quantum circuit, which we call Self-Attention Sequential Quantum Transformer Channel (SASTQuaCh), and demonstrate its utility on simplified classification problems.
arXiv Detail & Related papers (2024-03-21T18:00:04Z)
AlgoFormer: An Efficient Transformer Framework with Algorithmic Structures [80.28359222380733]
We design a novel transformer framework, dubbed AlgoFormer, to empower transformers with algorithmic capabilities. In particular, inspired by the structure of human-designed learning algorithms, our transformer framework consists of a pre-transformer that is responsible for task preprocessing. Some theoretical and empirical results are presented to show that the designed transformer has the potential to perform algorithm representation and learning.
arXiv Detail & Related papers (2024-02-21T07:07:54Z)
On the Convergence of Encoder-only Shallow Transformers [62.639819460956176]
We build the global convergence theory of encoder-only shallow Transformers under a realistic setting. Our results can pave the way for a better understanding of modern Transformers, particularly on training dynamics.
arXiv Detail & Related papers (2023-11-02T20:03:05Z)
Learning Transformer Programs [78.9509560355733]
We introduce a procedure for training Transformers that are mechanistically interpretable by design. Instead of compiling human-written programs into Transformers, we design a modified Transformer that can be trained using gradient-based optimization. The Transformer Programs can automatically find reasonable solutions, performing on par with standard Transformers of comparable size.
arXiv Detail & Related papers (2023-06-01T20:27:01Z)
Ground state preparation and energy estimation on early fault-tolerant quantum computers via quantum eigenvalue transformation of unitary matrices [3.1952399274829775]
We develop a tool called quantum eigenvalue transformation of unitary matrices with reals (QET-U) This leads to a simple quantum algorithm that outperforms all previous algorithms with a comparable circuit structure for estimating the ground state energy. We demonstrate the performance of the algorithm using IBM Qiskit for the transverse field Ising model.
arXiv Detail & Related papers (2022-04-12T17:11:40Z)
Sentence Bottleneck Autoencoders from Transformer Language Models [53.350633961266375]
We build a sentence-level autoencoder from a pretrained, frozen transformer language model. We adapt the masked language modeling objective as a generative, denoising one, while only training a sentence bottleneck and a single-layer modified transformer decoder. We demonstrate that the sentence representations discovered by our model achieve better quality than previous methods that extract representations from pretrained transformers on text similarity tasks, style transfer, and single-sentence classification tasks in the GLUE benchmark, while using fewer parameters than large pretrained models.
arXiv Detail & Related papers (2021-08-31T19:39:55Z)
Thinking Like Transformers [64.96770952820691]
We propose a computational model for the transformer-encoder in the form of a programming language. We show how RASP can be used to program solutions to tasks that could conceivably be learned by a Transformer. We provide RASP programs for histograms, sorting, and Dyck-languages.
arXiv Detail & Related papers (2021-06-13T13:04:46Z)
Transformers Solve the Limited Receptive Field for Monocular Depth Prediction [82.90445525977904]
We propose TransDepth, an architecture which benefits from both convolutional neural networks and transformers. This is the first paper which applies transformers into pixel-wise prediction problems involving continuous labels.
arXiv Detail & Related papers (2021-03-22T18:00:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.