Related papers: A Hybrid Transformer Architecture with a Quantized Self-Attention Mechanism Applied to Molecular Generation

A Hybrid Transformer Architecture with a Quantized Self-Attention Mechanism Applied to Molecular Generation

URL: http://arxiv.org/abs/2502.19214v1
Date: Wed, 26 Feb 2025 15:15:01 GMT
Title: A Hybrid Transformer Architecture with a Quantized Self-Attention Mechanism Applied to Molecular Generation
Authors: Anthony M. Smaldone, Yu Shee, Gregory W. Kyro, Marwa H. Farag, Zohim Chandani, Elica Kyoseva, Victor S. Batista,
Abstract summary: We propose a hybrid quantum-classical self-attention mechanism as part of a transformer decoder.<n>We show that the time complexity of the query-key dot product is reduced from $mathcalO(n2 d)$ in a classical model to $mathcalO(n2 d)$ in our quantum model.<n>This work provides a promising avenue for quantum-enhanced natural language processing (NLP)
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The success of the self-attention mechanism in classical machine learning models has inspired the development of quantum analogs aimed at reducing computational overhead. Self-attention integrates learnable query and key matrices to calculate attention scores between all pairs of tokens in a sequence. These scores are then multiplied by a learnable value matrix to obtain the output self-attention matrix, enabling the model to effectively capture long-range dependencies within the input sequence. Here, we propose a hybrid quantum-classical self-attention mechanism as part of a transformer decoder, the architecture underlying large language models (LLMs). To demonstrate its utility in chemistry, we train this model on the QM9 dataset for conditional generation, using SMILES strings as input, each labeled with a set of physicochemical properties that serve as conditions during inference. Our theoretical analysis shows that the time complexity of the query-key dot product is reduced from $\mathcal{O}(n^2 d)$ in a classical model to $\mathcal{O}(n^2\log d)$ in our quantum model, where $n$ and $d$ represent the sequence length and embedding dimension, respectively. We perform simulations using NVIDIA's CUDA-Q platform, which is designed for efficient GPU scalability. This work provides a promising avenue for quantum-enhanced natural language processing (NLP).

Related papers

An Efficient Quantum Classifier Based on Hamiltonian Representations [50.467930253994155]
Quantum machine learning (QML) is a discipline that seeks to transfer the advantages of quantum computing to data-driven tasks. We propose an efficient approach that circumvents the costs associated with data encoding by mapping inputs to a finite set of Pauli strings. We evaluate our approach on text and image classification tasks, against well-established classical and quantum models.
arXiv Detail & Related papers (2025-04-13T11:49:53Z)
Single-Qudit Quantum Neural Networks for Multiclass Classification [0.0]
This paper proposes a single-qudit quantum neural network for multiclass classification. Our design employs an $d$-dimensional unitary operator, where $d$ corresponds to the number of classes. We evaluate our model on the MNIST and EMNIST datasets, demonstrating competitive accuracy.
arXiv Detail & Related papers (2025-03-12T11:12:05Z)
Kolmogorov GAM Networks are all you need! [0.6906005491572398]
Kolmogorov GAM networks are shown to be an efficient architecture for training and inference.<n>They are an additive model with an embedding that is independent of the function of interest.
arXiv Detail & Related papers (2025-01-01T02:46:00Z)
Memory-Augmented Hybrid Quantum Reservoir Computing [0.0]
We present a hybrid quantum-classical approach that implements memory through classical post-processing of quantum measurements. We tested our model on two physical platforms: a fully connected Ising model and a Rydberg atom array.
arXiv Detail & Related papers (2024-09-15T22:44:09Z)
Unveiling Induction Heads: Provable Training Dynamics and Feature Learning in Transformers [54.20763128054692]
We study how a two-attention-layer transformer is trained to perform ICL on $n$-gram Markov chain data. We prove that the gradient flow with respect to a cross-entropy ICL loss converges to a limiting model.
arXiv Detail & Related papers (2024-09-09T18:10:26Z)
LongVQ: Long Sequence Modeling with Vector Quantization on Structured Memory [63.41820940103348]
Self-attention mechanism's computational cost limits its practicality for long sequences. We propose a new method called LongVQ to compress the global abstraction as a length-fixed codebook. LongVQ effectively maintains dynamic global and local patterns, which helps to complement the lack of long-range dependency issues.
arXiv Detail & Related papers (2024-04-17T08:26:34Z)
Quantum linear algebra is all you need for Transformer architectures [1.660288273261283]
We investigate transformer architectures under the lens of fault-tolerant quantum computing. We show how to prepare a block encoding of the self-attention matrix, with a new subroutine for the row-wise application of the softmax function. Our subroutines prepare an amplitude encoding of the transformer output, which can be measured to obtain a prediction.
arXiv Detail & Related papers (2024-02-26T16:31:28Z)
Sparse Modular Activation for Efficient Sequence Modeling [94.11125833685583]
Recent models combining Linear State Space Models with self-attention mechanisms have demonstrated impressive results across a range of sequence modeling tasks. Current approaches apply attention modules statically and uniformly to all elements in the input sequences, leading to sub-optimal quality-efficiency trade-offs. We introduce Sparse Modular Activation (SMA), a general mechanism enabling neural networks to sparsely activate sub-modules for sequence elements in a differentiable manner.
arXiv Detail & Related papers (2023-06-19T23:10:02Z)
QNet: A Quantum-native Sequence Encoder Architecture [2.8099769011264586]
This work proposes QNet, a novel sequence encoder model that entirely inferences on the quantum computer using a minimum number of qubits. In addition, we introduce ResQNet, a quantum-classical hybrid model composed of several QNet blocks linked by residual connections.
arXiv Detail & Related papers (2022-10-31T12:36:37Z)
Automatic and effective discovery of quantum kernels [41.61572387137452]
Quantum computing can empower machine learning models by enabling kernel machines to leverage quantum kernels for representing similarity measures between data.<n>We present an approach to this problem, which employs optimization techniques, similar to those used in neural architecture search and AutoML.<n>The results obtained by testing our approach on a high-energy physics problem demonstrate that, in the best-case scenario, we can either match or improve testing accuracy with respect to the manual design approach.
arXiv Detail & Related papers (2022-09-22T16:42:14Z)
Towards Quantum Graph Neural Networks: An Ego-Graph Learning Approach [47.19265172105025]
We propose a novel hybrid quantum-classical algorithm for graph-structured data, which we refer to as the Ego-graph based Quantum Graph Neural Network (egoQGNN) egoQGNN implements the GNN theoretical framework using the tensor product and unity matrix representation, which greatly reduces the number of model parameters required. The architecture is based on a novel mapping from real-world data to Hilbert space.
arXiv Detail & Related papers (2022-01-13T16:35:45Z)
Simulating nonnative cubic interactions on noisy quantum machines [65.38483184536494]
We show that quantum processors can be programmed to efficiently simulate dynamics that are not native to the hardware. On noisy devices without error correction, we show that simulation results are significantly improved when the quantum program is compiled using modular gates.
arXiv Detail & Related papers (2020-04-15T05:16:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.