Low-Overhead Parallelisation of LCU via Commuting Operators
- URL: http://arxiv.org/abs/2312.00696v3
- Date: Wed, 21 Aug 2024 14:00:04 GMT
- Title: Low-Overhead Parallelisation of LCU via Commuting Operators
- Authors: Gregory Boyd,
- Abstract summary: Linear Combination of Unitaries (LCU) is a powerful scheme for the block encoding of operators but suffers from high overheads.
We discuss the parallelisation of LCU and in particular the SELECT subroutine of LCU based on partitioning of observables into groups of commuting operators.
We additionally discuss the parallelisation of QROM circuits which are a special case of our main results.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The Linear Combination of Unitaries (LCU) method is a powerful scheme for the block encoding of operators but suffers from high overheads. In this work, we discuss the parallelisation of LCU and in particular the SELECT subroutine of LCU based on partitioning of observables into groups of commuting operators, as well as the use of adaptive circuits and teleportation that allow us to perform required Clifford circuits in constant depth. We additionally discuss the parallelisation of QROM circuits which are a special case of our main results, and provide methods to parallelise the action of multi-controlled gates on the control register. We only require an $O(\log n)$ factor increase in the number of qubits in order to produce a significant depth reduction, with prior work suggesting that for molecular Hamiltonians, the depth saving is $O(n)$, and numerics indicating depth savings of a factor approximately $n/2$. The implications of our method in the fault-tolerant setting are also considered, noting that parallelisation reduces the $T$-depth by the same factor as the logical algorithm, without changing the $T$-count, and that our method can significantly reduce the overall space-time volume of the computation, even when including the increased number of $T$ factories required by parallelisation.
Related papers
- Low Depth Phase Oracle Using a Parallel Piecewise Circuit [3.629687485125086]
We explore the important task of applying a phase $exp(i f(x))$ to a computational basis state $left| x right>$.
The closely related task of rotating a target qubit by an angle depending on $f(x)$ is also studied.
arXiv Detail & Related papers (2024-09-06T19:57:13Z) - On the Constant Depth Implementation of Pauli Exponentials [49.48516314472825]
We decompose arbitrary exponentials into circuits of constant depth using $mathcalO(n)$ ancillae and two-body XX and ZZ interactions.
We prove the correctness of our approach, after introducing novel rewrite rules for circuits which benefit from qubit recycling.
arXiv Detail & Related papers (2024-08-15T17:09:08Z) - Fast and Parallelizable Logical Computation with Homological Product Codes [3.4338109681532027]
High-rate quantum low-density-parity-check (qLDPC) codes promise a route to reduce qubit numbers, but performing computation while maintaining low space cost has required serialization of operations and extra time costs.
We design fast and parallelizable logical gates for qLDPC codes, demonstrating their utility for key algorithmic subroutines such as the quantum adder.
arXiv Detail & Related papers (2024-07-26T03:49:59Z) - Chain of Thought Empowers Transformers to Solve Inherently Serial Problems [57.58801785642868]
Chain of thought (CoT) is a highly effective method to improve the accuracy of large language models (LLMs) on arithmetics and symbolic reasoning tasks.
This work provides a theoretical understanding of the power of CoT for decoder-only transformers through the lens of expressiveness.
arXiv Detail & Related papers (2024-02-20T10:11:03Z) - DeepPCR: Parallelizing Sequential Operations in Neural Networks [4.241834259165193]
We introduce DeepPCR, a novel algorithm which parallelizes typically sequential operations in order to speed up inference and training of neural networks.
DeepPCR is based on interpreting a sequence of $L$ steps as the solution of a specific system of equations, which we recover using the Parallel Cyclic Reduction algorithm.
To verify the theoretical lower complexity of the algorithm, and to identify regimes for speedup, we test the effectiveness of DeepPCR in parallelizing the forward and backward pass in multi-layer perceptrons.
arXiv Detail & Related papers (2023-09-28T10:15:30Z) - Efficient parallelization of quantum basis state shift [0.0]
We optimize the state shift algorithm by incorporating the shift in different directions in parallel.
This provides a significant reduction in the depth of the quantum circuit in comparison to the currently known methods.
We focus on the one-dimensional and periodic shift, but note that the method can be extended to more complex cases.
arXiv Detail & Related papers (2023-04-04T11:01:08Z) - Matching Pursuit Based Scheduling for Over-the-Air Federated Learning [67.59503935237676]
This paper develops a class of low-complexity device scheduling algorithms for over-the-air learning via the method of federated learning.
Compared to the state-of-the-art proposed scheme, the proposed scheme poses a drastically lower efficiency system.
The efficiency of the proposed scheme is confirmed via experiments on the CIFAR dataset.
arXiv Detail & Related papers (2022-06-14T08:14:14Z) - Optimization-based Block Coordinate Gradient Coding for Mitigating
Partial Stragglers in Distributed Learning [58.91954425047425]
This paper aims to design a new gradient coding scheme for mitigating partial stragglers in distributed learning.
We propose a gradient coordinate coding scheme with L coding parameters representing L possibly different diversities for the L coordinates, which generates most gradient coding schemes.
arXiv Detail & Related papers (2022-06-06T09:25:40Z) - Surface code compilation via edge-disjoint paths [0.0]
We show how to prepare many long-range pairs on qubits connected by edge-disjoint paths of ancillas in constant depth.
This forms one core part of our Edge-Disjoint Paths Compilation algorithm.
We find significantly improved performance for circuits built from parallel cnots, and for circuits which implement the multi-controlled $X$ gate.
arXiv Detail & Related papers (2021-10-21T21:40:43Z) - Accurate methods for the analysis of strong-drive effects in parametric
gates [94.70553167084388]
We show how to efficiently extract gate parameters using exact numerics and a perturbative analytical approach.
We identify optimal regimes of operation for different types of gates including $i$SWAP, controlled-Z, and CNOT.
arXiv Detail & Related papers (2021-07-06T02:02:54Z) - Minimal Filtering Algorithms for Convolutional Neural Networks [82.24592140096622]
We develop fully parallel hardware-oriented algorithms for implementing the basic filtering operation for M=3,5,7,9, and 11.
A fully parallel hardware implementation of the proposed algorithms in each case gives approximately 30 percent savings in the number of embedded multipliers.
arXiv Detail & Related papers (2020-04-12T13:18:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.