Quantum Doubly Stochastic Transformers
- URL: http://arxiv.org/abs/2504.16275v1
- Date: Tue, 22 Apr 2025 21:15:45 GMT
- Title: Quantum Doubly Stochastic Transformers
- Authors: Jannis Born, Filip Skogh, Kahn Rhrissorrakrai, Filippo Utro, Nico Wagner, Aleksandros Sobczyk,
- Abstract summary: We show that a doubly Transformer (QDSFormer) replaces the Softmax in the self-attention layer with a variational quantum circuit.<n>We find that our QDSFormer consistently surpasses both a standard Vision Transformer and other doubly Transformer.<n>The QDSFormer also shows improved training stability and lower performance variation suggesting that it may preserve the notoriously unstable training of ViTs on small-scale data.
- Score: 39.944142646344645
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: At the core of the Transformer, the Softmax normalizes the attention matrix to be right stochastic. Previous research has shown that this often destabilizes training and that enforcing the attention matrix to be doubly stochastic (through Sinkhorn's algorithm) consistently improves performance across different tasks, domains and Transformer flavors. However, Sinkhorn's algorithm is iterative, approximative, non-parametric and thus inflexible w.r.t. the obtained doubly stochastic matrix (DSM). Recently, it has been proven that DSMs can be obtained with a parametric quantum circuit, yielding a novel quantum inductive bias for DSMs with no known classical analogue. Motivated by this, we demonstrate the feasibility of a hybrid classical-quantum doubly stochastic Transformer (QDSFormer) that replaces the Softmax in the self-attention layer with a variational quantum circuit. We study the expressive power of the circuit and find that it yields more diverse DSMs that better preserve information than classical operators. Across multiple small-scale object recognition tasks, we find that our QDSFormer consistently surpasses both a standard Vision Transformer and other doubly stochastic Transformers. Beyond the established Sinkformer, this comparison includes a novel quantum-inspired doubly stochastic Transformer (based on QR decomposition) that can be of independent interest. The QDSFormer also shows improved training stability and lower performance variation suggesting that it may mitigate the notoriously unstable training of ViTs on small-scale data.
Related papers
- Clustering in Deep Stochastic Transformers [10.988655177671255]
Existing theories of deep Transformers with layer normalization typically predict that tokens cluster to a single point.<n>We analyze deep Transformers where noise arises from the random value of value.<n>For two tokens, we prove a phase transition governed by the interaction strength and the token dimension.
arXiv Detail & Related papers (2026-01-29T16:28:13Z) - Quantum-Inspired Algorithms beyond Unitary Circuits: the Laplace Transform [0.0]
Quantum-inspired algorithms can deliver substantial speedups over classical state-of-the-art methods.<n>We introduce a tensor-network approach to compute the discrete Laplace transform, a non-unitary, aperiodic transform.<n>We demonstrate simulations up to $N=230$ input data points, with up to $260$ output data points, and quantify how bond dimension controls runtime and accuracy.
arXiv Detail & Related papers (2026-01-25T07:19:56Z) - Continual Quantum Architecture Search with Tensor-Train Encoding: Theory and Applications to Signal Processing [68.35481158940401]
CL-QAS is a continual quantum architecture search framework.<n>It mitigates challenges of costly encoding amplitude and forgetting in variational quantum circuits.<n>It achieves controllable robustness expressivity, sample-efficient generalization, and smooth convergence without barren plateaus.
arXiv Detail & Related papers (2026-01-10T02:36:03Z) - Towards Quantum Enhanced Adversarial Robustness with Rydberg Reservoir Learning [45.92935470813908]
Quantum computing reservoir (QRC) leverages the high-dimensional, nonlinear dynamics inherent in quantum many-body systems.<n>Recent studies indicate that perturbation quantums based on variational circuits remain susceptible to adversarials.<n>We investigate the first systematic evaluation of adversarial robustness in a QR based learning model.
arXiv Detail & Related papers (2025-10-15T12:17:23Z) - Digital quantum simulation of many-body localization crossover in a disordered kicked Ising model [0.0]
We propose simulating the many-body localization crossover as a nonequilibrium problem in the disordered Floquet many-body systems.<n>We compute out-of-time-ordered correlators as an indicator of the many-body localization crossover.<n>The validity of the results is confirmed by comparing two independent error mitigation methods.
arXiv Detail & Related papers (2025-10-02T12:57:19Z) - Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning? [69.4145579827826]
We show a fast flow on the regression loss despite the gradient non-ity algorithms for our convergence landscape.
This is the first theoretical analysis for multi-layer Transformer in this setting.
arXiv Detail & Related papers (2024-10-10T18:29:05Z) - Learning with SASQuaTCh: a Novel Variational Quantum Transformer Architecture with Kernel-Based Self-Attention [0.464982780843177]
We present a variational quantum circuit architecture named Self-Attention Sequential Quantum Transformer Channel (SASQuaT)<n>Our approach leverages recent insights from kernel-based operator learning in the context of predicting vision transformer network using simple gate operations and a set of multi-dimensional quantum Fourier transforms.<n>To validate our approach, we consider image classification tasks in simulation and with hardware, where with only 9 qubits and a handful of parameters we are able to simultaneously embed and classify a grayscale image of handwritten digits with high accuracy.
arXiv Detail & Related papers (2024-03-21T18:00:04Z) - A sharp phase transition in linear cross-entropy benchmarking [1.4841630983274847]
A key question in the theory of XEB is whether it approximates the fidelity of the quantum state preparation.
Previous works have shown that the XEB generically approximates the fidelity in a regime where the noise rate per qudit $varepsilon$ satisfies $varepsilon N ll 1$.
Here, we show that the breakdown of XEB as a fidelity proxy occurs as a sharp phase transition at a critical value of $varepsilon N$.
arXiv Detail & Related papers (2023-05-08T18:00:05Z) - Convergence and Quantum Advantage of Trotterized MERA for Strongly-Correlated Systems [0.0]
Trotterized MERA VQE is a promising route for the efficient investigation of strongly-correlated quantum many-body systems on quantum computers.<n>We show how the convergence can be substantially improved by building up the MERA layer by layer in the stage and by scanning through the phase diagram during optimization.
arXiv Detail & Related papers (2023-03-15T20:09:45Z) - NAG-GS: Semi-Implicit, Accelerated and Robust Stochastic Optimizer [45.47667026025716]
We propose a novel, robust and accelerated iteration that relies on two key elements.
The convergence and stability of the obtained method, referred to as NAG-GS, are first studied extensively.
We show that NAG-arity is competitive with state-the-art methods such as momentum SGD with weight decay and AdamW for the training of machine learning models.
arXiv Detail & Related papers (2022-09-29T16:54:53Z) - Softmax-free Linear Transformers [90.83157268265654]
Vision transformers (ViTs) have pushed the state-of-the-art for visual perception tasks.
Existing methods are either theoretically flawed or empirically ineffective for visual recognition.
We propose a family of Softmax-Free Transformers (SOFT)
arXiv Detail & Related papers (2022-07-05T03:08:27Z) - Verifying quantum information scrambling dynamics in a fully
controllable superconducting quantum simulator [0.0]
We study the verified scrambling in a 1D spin chain by an analogue superconducting quantum simulator with the signs and values of individual driving and coupling terms fully controllable.
Our work demonstrates the superconducting system as a powerful quantum simulator.
arXiv Detail & Related papers (2021-12-21T13:41:47Z) - SOFT: Softmax-free Transformer with Linear Complexity [112.9754491864247]
Vision transformers (ViTs) have pushed the state-of-the-art for various visual recognition tasks by patch-wise image tokenization followed by self-attention.
Various attempts on approximating the self-attention with linear complexity have been made in Natural Language Processing.
We identify that their limitations are rooted in keeping the softmax self-attention during approximations.
For the first time, a softmax-free transformer or SOFT is proposed.
arXiv Detail & Related papers (2021-10-22T17:57:29Z) - Sinkformers: Transformers with Doubly Stochastic Attention [22.32840998053339]
We use Sinkhorn's algorithm to make attention matrices doubly. We call the resulting model a Sinkformer.
On the experimental side, we show Sinkformers enhance model accuracy in vision and natural language processing tasks.
Importantly, on 3D shapes classification, Sinkformers lead to a significant improvement.
arXiv Detail & Related papers (2021-10-22T13:25:01Z) - Quantum algorithms for quantum dynamics: A performance study on the
spin-boson model [68.8204255655161]
Quantum algorithms for quantum dynamics simulations are traditionally based on implementing a Trotter-approximation of the time-evolution operator.
variational quantum algorithms have become an indispensable alternative, enabling small-scale simulations on present-day hardware.
We show that, despite providing a clear reduction of quantum gate cost, the variational method in its current implementation is unlikely to lead to a quantum advantage.
arXiv Detail & Related papers (2021-08-09T18:00:05Z) - Transmon platform for quantum computing challenged by chaotic
fluctuations [55.41644538483948]
We investigate the stability of a variant of a many-body localized (MBL) phase for system parameters relevant to current quantum processors.
We find that these computing platforms are dangerously close to a phase of uncontrollable chaotic fluctuations.
arXiv Detail & Related papers (2020-12-10T19:00:03Z) - Adaptive Variational Quantum Dynamics Simulations [3.629716738568079]
We propose a general-purpose, self-adaptive approach to construct variational wavefunction ans"atze for highly accurate quantum dynamics simulations.
We apply this approach to the integrable Lieb-Schultz-Mattis spin chain and the nonintegrable mixed-field Ising model.
We envision that a wide range of dynamical simulations of quantum many-body systems on near-term quantum computing devices will be made possible through the AVQDS framework.
arXiv Detail & Related papers (2020-11-01T20:21:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.