Related papers: Quantum Doubly Stochastic Transformers

Quantum Doubly Stochastic Transformers

URL: http://arxiv.org/abs/2504.16275v1
Date: Tue, 22 Apr 2025 21:15:45 GMT
Title: Quantum Doubly Stochastic Transformers
Authors: Jannis Born, Filip Skogh, Kahn Rhrissorrakrai, Filippo Utro, Nico Wagner, Aleksandros Sobczyk,
Abstract summary: We show that a doubly Transformer (QDSFormer) replaces the Softmax in the self-attention layer with a variational quantum circuit.<n>We find that our QDSFormer consistently surpasses both a standard Vision Transformer and other doubly Transformer.<n>The QDSFormer also shows improved training stability and lower performance variation suggesting that it may preserve the notoriously unstable training of ViTs on small-scale data.
Score: 39.944142646344645
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: At the core of the Transformer, the Softmax normalizes the attention matrix to be right stochastic. Previous research has shown that this often destabilizes training and that enforcing the attention matrix to be doubly stochastic (through Sinkhorn's algorithm) consistently improves performance across different tasks, domains and Transformer flavors. However, Sinkhorn's algorithm is iterative, approximative, non-parametric and thus inflexible w.r.t. the obtained doubly stochastic matrix (DSM). Recently, it has been proven that DSMs can be obtained with a parametric quantum circuit, yielding a novel quantum inductive bias for DSMs with no known classical analogue. Motivated by this, we demonstrate the feasibility of a hybrid classical-quantum doubly stochastic Transformer (QDSFormer) that replaces the Softmax in the self-attention layer with a variational quantum circuit. We study the expressive power of the circuit and find that it yields more diverse DSMs that better preserve information than classical operators. Across multiple small-scale object recognition tasks, we find that our QDSFormer consistently surpasses both a standard Vision Transformer and other doubly stochastic Transformers. Beyond the established Sinkformer, this comparison includes a novel quantum-inspired doubly stochastic Transformer (based on QR decomposition) that can be of independent interest. The QDSFormer also shows improved training stability and lower performance variation suggesting that it may mitigate the notoriously unstable training of ViTs on small-scale data.

Related papers

Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning? [69.4145579827826]
We show a fast flow on the regression loss despite the gradient non-ity algorithms for our convergence landscape. This is the first theoretical analysis for multi-layer Transformer in this setting.
arXiv Detail & Related papers (2024-10-10T18:29:05Z)
Learning with SASQuaTCh: a Novel Variational Quantum Transformer Architecture with Kernel-Based Self-Attention [0.464982780843177]
We present a variational quantum circuit architecture named Self-Attention Sequential Quantum Transformer Channel (SASQuaT)<n>Our approach leverages recent insights from kernel-based operator learning in the context of predicting vision transformer network using simple gate operations and a set of multi-dimensional quantum Fourier transforms.<n>To validate our approach, we consider image classification tasks in simulation and with hardware, where with only 9 qubits and a handful of parameters we are able to simultaneously embed and classify a grayscale image of handwritten digits with high accuracy.
arXiv Detail & Related papers (2024-03-21T18:00:04Z)
A sharp phase transition in linear cross-entropy benchmarking [1.4841630983274847]
A key question in the theory of XEB is whether it approximates the fidelity of the quantum state preparation. Previous works have shown that the XEB generically approximates the fidelity in a regime where the noise rate per qudit $varepsilon$ satisfies $varepsilon N ll 1$. Here, we show that the breakdown of XEB as a fidelity proxy occurs as a sharp phase transition at a critical value of $varepsilon N$.
arXiv Detail & Related papers (2023-05-08T18:00:05Z)
Convergence and Quantum Advantage of Trotterized MERA for Strongly-Correlated Systems [0.0]
Trotterized MERA VQE is a promising route for the efficient investigation of strongly-correlated quantum many-body systems on quantum computers.<n>We show how the convergence can be substantially improved by building up the MERA layer by layer in the stage and by scanning through the phase diagram during optimization.
arXiv Detail & Related papers (2023-03-15T20:09:45Z)
Softmax-free Linear Transformers [90.83157268265654]
Vision transformers (ViTs) have pushed the state-of-the-art for visual perception tasks. Existing methods are either theoretically flawed or empirically ineffective for visual recognition. We propose a family of Softmax-Free Transformers (SOFT)
arXiv Detail & Related papers (2022-07-05T03:08:27Z)
Verifying quantum information scrambling dynamics in a fully controllable superconducting quantum simulator [0.0]
We study the verified scrambling in a 1D spin chain by an analogue superconducting quantum simulator with the signs and values of individual driving and coupling terms fully controllable. Our work demonstrates the superconducting system as a powerful quantum simulator.
arXiv Detail & Related papers (2021-12-21T13:41:47Z)
SOFT: Softmax-free Transformer with Linear Complexity [112.9754491864247]
Vision transformers (ViTs) have pushed the state-of-the-art for various visual recognition tasks by patch-wise image tokenization followed by self-attention. Various attempts on approximating the self-attention with linear complexity have been made in Natural Language Processing. We identify that their limitations are rooted in keeping the softmax self-attention during approximations. For the first time, a softmax-free transformer or SOFT is proposed.
arXiv Detail & Related papers (2021-10-22T17:57:29Z)
Sinkformers: Transformers with Doubly Stochastic Attention [22.32840998053339]
We use Sinkhorn's algorithm to make attention matrices doubly. We call the resulting model a Sinkformer. On the experimental side, we show Sinkformers enhance model accuracy in vision and natural language processing tasks. Importantly, on 3D shapes classification, Sinkformers lead to a significant improvement.
arXiv Detail & Related papers (2021-10-22T13:25:01Z)
Quantum algorithms for quantum dynamics: A performance study on the spin-boson model [68.8204255655161]
Quantum algorithms for quantum dynamics simulations are traditionally based on implementing a Trotter-approximation of the time-evolution operator. variational quantum algorithms have become an indispensable alternative, enabling small-scale simulations on present-day hardware. We show that, despite providing a clear reduction of quantum gate cost, the variational method in its current implementation is unlikely to lead to a quantum advantage.
arXiv Detail & Related papers (2021-08-09T18:00:05Z)
Transmon platform for quantum computing challenged by chaotic fluctuations [55.41644538483948]
We investigate the stability of a variant of a many-body localized (MBL) phase for system parameters relevant to current quantum processors. We find that these computing platforms are dangerously close to a phase of uncontrollable chaotic fluctuations.
arXiv Detail & Related papers (2020-12-10T19:00:03Z)
Adaptive Variational Quantum Dynamics Simulations [3.629716738568079]
We propose a general-purpose, self-adaptive approach to construct variational wavefunction ans"atze for highly accurate quantum dynamics simulations. We apply this approach to the integrable Lieb-Schultz-Mattis spin chain and the nonintegrable mixed-field Ising model. We envision that a wide range of dynamical simulations of quantum many-body systems on near-term quantum computing devices will be made possible through the AVQDS framework.
arXiv Detail & Related papers (2020-11-01T20:21:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.