Low-Overhead Parallelisation of LCU via Commuting Operators
- URL: http://arxiv.org/abs/2312.00696v2
- Date: Mon, 15 Apr 2024 10:20:13 GMT
- Title: Low-Overhead Parallelisation of LCU via Commuting Operators
- Authors: Gregory Boyd,
- Abstract summary: Linear Combination of Unitaries (LCU) is a powerful scheme for the block encoding of operators but suffers from high overheads.
We discuss the parallelisation of LCU and in particular the SELECT subroutine of LCU based on partitioning of observables into groups of commuting operators.
We additionally discuss the parallelisation of QROM circuits which are a special case of our main results.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The Linear Combination of Unitaries (LCU) method is a powerful scheme for the block encoding of operators but suffers from high overheads. In this work, we discuss the parallelisation of LCU and in particular the SELECT subroutine of LCU based on partitioning of observables into groups of commuting operators, as well as the use of adaptive circuits and teleportation that allow us to perform required Clifford circuits in constant depth. We additionally discuss the parallelisation of QROM circuits which are a special case of our main results, and provide methods to parallelise the action of multi-controlled gates on the control register. We only require an $O(\log n)$ factor increase in the number of qubits in order to produce a significant depth reduction, with prior work suggesting that for molecular Hamiltonians, the depth saving is $O(n)$, and numerics indicating depth savings of a factor approximately $n/2$. The implications of our method in the fault-tolerant setting are also considered, noting that parallelisation reduces the $T$-depth by the same factor as the logical algorithm, without changing the $T$-count, and that our method can significantly reduce the overall space-time volume of the computation, even when including the increased number of $T$ factories required by parallelisation.
Related papers
- Multi-qubit Lattice Surgery Scheduling [3.7126786554865774]
A quantum circuit can be transpiled into a sequence of solely non-Clifford multi-qubit gates.
We show that the transpilation significantly reduces the circuit length on the set of circuits tested.
The resulting circuit of multi-qubit gates has a further reduction in the expected circuit execution time compared to serial execution.
arXiv Detail & Related papers (2024-05-27T22:41:41Z) - Chain of Thought Empowers Transformers to Solve Inherently Serial Problems [57.58801785642868]
Chain of thought (CoT) is a highly effective method to improve the accuracy of large language models (LLMs) on arithmetics and symbolic reasoning tasks.
This work provides a theoretical understanding of the power of CoT for decoder-only transformers through the lens of expressiveness.
arXiv Detail & Related papers (2024-02-20T10:11:03Z) - DeepPCR: Parallelizing Sequential Operations in Neural Networks [4.241834259165193]
We introduce DeepPCR, a novel algorithm which parallelizes typically sequential operations in order to speed up inference and training of neural networks.
DeepPCR is based on interpreting a sequence of $L$ steps as the solution of a specific system of equations, which we recover using the Parallel Cyclic Reduction algorithm.
To verify the theoretical lower complexity of the algorithm, and to identify regimes for speedup, we test the effectiveness of DeepPCR in parallelizing the forward and backward pass in multi-layer perceptrons.
arXiv Detail & Related papers (2023-09-28T10:15:30Z) - Efficient parallelization of quantum basis state shift [0.0]
We optimize the state shift algorithm by incorporating the shift in different directions in parallel.
This provides a significant reduction in the depth of the quantum circuit in comparison to the currently known methods.
We focus on the one-dimensional and periodic shift, but note that the method can be extended to more complex cases.
arXiv Detail & Related papers (2023-04-04T11:01:08Z) - DADAO: Decoupled Accelerated Decentralized Asynchronous Optimization [0.0]
DADAO is the first decentralized, accelerated, asynchronous, primal, first-order algorithm to minimize a sum of $L$-smooth and $mu$-strongly convex functions distributed over a given network of size $n$.
We show that our algorithm requires $mathcalO(nsqrtchisqrtfracLmulog(frac1epsilon)$ local and only $mathcalO(nsqrtchisqrtfracLmulog(
arXiv Detail & Related papers (2022-07-26T08:47:54Z) - Matching Pursuit Based Scheduling for Over-the-Air Federated Learning [67.59503935237676]
This paper develops a class of low-complexity device scheduling algorithms for over-the-air learning via the method of federated learning.
Compared to the state-of-the-art proposed scheme, the proposed scheme poses a drastically lower efficiency system.
The efficiency of the proposed scheme is confirmed via experiments on the CIFAR dataset.
arXiv Detail & Related papers (2022-06-14T08:14:14Z) - Optimization-based Block Coordinate Gradient Coding for Mitigating
Partial Stragglers in Distributed Learning [58.91954425047425]
This paper aims to design a new gradient coding scheme for mitigating partial stragglers in distributed learning.
We propose a gradient coordinate coding scheme with L coding parameters representing L possibly different diversities for the L coordinates, which generates most gradient coding schemes.
arXiv Detail & Related papers (2022-06-06T09:25:40Z) - Surface code compilation via edge-disjoint paths [0.0]
We show how to prepare many long-range pairs on qubits connected by edge-disjoint paths of ancillas in constant depth.
This forms one core part of our Edge-Disjoint Paths Compilation algorithm.
We find significantly improved performance for circuits built from parallel cnots, and for circuits which implement the multi-controlled $X$ gate.
arXiv Detail & Related papers (2021-10-21T21:40:43Z) - Accurate methods for the analysis of strong-drive effects in parametric
gates [94.70553167084388]
We show how to efficiently extract gate parameters using exact numerics and a perturbative analytical approach.
We identify optimal regimes of operation for different types of gates including $i$SWAP, controlled-Z, and CNOT.
arXiv Detail & Related papers (2021-07-06T02:02:54Z) - Fast and Complete: Enabling Complete Neural Network Verification with
Rapid and Massively Parallel Incomplete Verifiers [112.23981192818721]
We propose to use backward mode linear relaxation based analysis (LiRPA) to replace Linear Programming (LP) during the BaB process.
Unlike LP, LiRPA when applied naively can produce much weaker bounds and even cannot check certain conflicts of sub-domains during splitting.
We demonstrate an order of magnitude speedup compared to existing LP-based approaches.
arXiv Detail & Related papers (2020-11-27T16:42:12Z) - Minimal Filtering Algorithms for Convolutional Neural Networks [82.24592140096622]
We develop fully parallel hardware-oriented algorithms for implementing the basic filtering operation for M=3,5,7,9, and 11.
A fully parallel hardware implementation of the proposed algorithms in each case gives approximately 30 percent savings in the number of embedded multipliers.
arXiv Detail & Related papers (2020-04-12T13:18:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.