Related papers: Low-Overhead Parallelisation of LCU via Commuting Operators

Low-Overhead Parallelisation of LCU via Commuting Operators

URL: http://arxiv.org/abs/2312.00696v3
Date: Wed, 21 Aug 2024 14:00:04 GMT
Title: Low-Overhead Parallelisation of LCU via Commuting Operators
Authors: Gregory Boyd,
Abstract summary: Linear Combination of Unitaries (LCU) is a powerful scheme for the block encoding of operators but suffers from high overheads. We discuss the parallelisation of LCU and in particular the SELECT subroutine of LCU based on partitioning of observables into groups of commuting operators. We additionally discuss the parallelisation of QROM circuits which are a special case of our main results.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The Linear Combination of Unitaries (LCU) method is a powerful scheme for the block encoding of operators but suffers from high overheads. In this work, we discuss the parallelisation of LCU and in particular the SELECT subroutine of LCU based on partitioning of observables into groups of commuting operators, as well as the use of adaptive circuits and teleportation that allow us to perform required Clifford circuits in constant depth. We additionally discuss the parallelisation of QROM circuits which are a special case of our main results, and provide methods to parallelise the action of multi-controlled gates on the control register. We only require an $O(\log n)$ factor increase in the number of qubits in order to produce a significant depth reduction, with prior work suggesting that for molecular Hamiltonians, the depth saving is $O(n)$, and numerics indicating depth savings of a factor approximately $n/2$. The implications of our method in the fault-tolerant setting are also considered, noting that parallelisation reduces the $T$-depth by the same factor as the logical algorithm, without changing the $T$-count, and that our method can significantly reduce the overall space-time volume of the computation, even when including the increased number of $T$ factories required by parallelisation.

Related papers

Near-Optimal Online Learning for Multi-Agent Submodular Coordination: Tight Approximation and Communication Efficiency [52.60557300927007]
We present a $textbfMA-OSMA$ algorithm to transfer the discrete submodular problem into a continuous optimization. We also introduce a projection-free $textbfMA-OSEA$ algorithm, which effectively utilizes the KL divergence by mixing a uniform distribution. Our algorithms significantly improve the $(frac11+c)$-approximation provided by the state-of-the-art OSG algorithm.
arXiv Detail & Related papers (2025-02-07T15:57:56Z)
Distributed quantum logic algorithm [0.0]
This work examines a method for reducing circuit depth by introducing auxiliary qubits to enable parallel gate execution. We show that any circuit on $n$ qubits with depth $Oleft(M n2right)$, where $M = M(n)$ is some function of $n$, can be transformed into a circuit with depth $Oleft(log_2(M) n2right)$ operating on $Oleft(M nright)$ qubits.
arXiv Detail & Related papers (2024-11-18T19:08:07Z)
Low Depth Phase Oracle Using a Parallel Piecewise Circuit [3.629687485125086]
We explore the important task of applying a phase $exp(i f(x))$ to a computational basis state $left| x right>$. The closely related task of rotating a target qubit by an angle depending on $f(x)$ is also studied.
arXiv Detail & Related papers (2024-09-06T19:57:13Z)
On the Constant Depth Implementation of Pauli Exponentials [49.48516314472825]
We decompose arbitrary exponentials into circuits of constant depth using $mathcalO(n)$ ancillae and two-body XX and ZZ interactions. We prove the correctness of our approach, after introducing novel rewrite rules for circuits which benefit from qubit recycling.
arXiv Detail & Related papers (2024-08-15T17:09:08Z)
Fast and Parallelizable Logical Computation with Homological Product Codes [3.4338109681532027]
High-rate quantum low-density-parity-check (qLDPC) codes promise a route to reduce qubit numbers, but performing computation while maintaining low space cost has required serialization of operations and extra time costs. We design fast and parallelizable logical gates for qLDPC codes, demonstrating their utility for key algorithmic subroutines such as the quantum adder.
arXiv Detail & Related papers (2024-07-26T03:49:59Z)
Chain of Thought Empowers Transformers to Solve Inherently Serial Problems [57.58801785642868]
Chain of thought (CoT) is a highly effective method to improve the accuracy of large language models (LLMs) on arithmetics and symbolic reasoning tasks. This work provides a theoretical understanding of the power of CoT for decoder-only transformers through the lens of expressiveness.
arXiv Detail & Related papers (2024-02-20T10:11:03Z)
DeepPCR: Parallelizing Sequential Operations in Neural Networks [4.241834259165193]
We introduce DeepPCR, a novel algorithm which parallelizes typically sequential operations in order to speed up inference and training of neural networks. DeepPCR is based on interpreting a sequence of $L$ steps as the solution of a specific system of equations, which we recover using the Parallel Cyclic Reduction algorithm. To verify the theoretical lower complexity of the algorithm, and to identify regimes for speedup, we test the effectiveness of DeepPCR in parallelizing the forward and backward pass in multi-layer perceptrons.
arXiv Detail & Related papers (2023-09-28T10:15:30Z)
Matching Pursuit Based Scheduling for Over-the-Air Federated Learning [67.59503935237676]
This paper develops a class of low-complexity device scheduling algorithms for over-the-air learning via the method of federated learning. Compared to the state-of-the-art proposed scheme, the proposed scheme poses a drastically lower efficiency system. The efficiency of the proposed scheme is confirmed via experiments on the CIFAR dataset.
arXiv Detail & Related papers (2022-06-14T08:14:14Z)
Optimization-based Block Coordinate Gradient Coding for Mitigating Partial Stragglers in Distributed Learning [58.91954425047425]
This paper aims to design a new gradient coding scheme for mitigating partial stragglers in distributed learning. We propose a gradient coordinate coding scheme with L coding parameters representing L possibly different diversities for the L coordinates, which generates most gradient coding schemes.
arXiv Detail & Related papers (2022-06-06T09:25:40Z)
Surface code compilation via edge-disjoint paths [0.0]
We show how to prepare many long-range pairs on qubits connected by edge-disjoint paths of ancillas in constant depth. This forms one core part of our Edge-Disjoint Paths Compilation algorithm. We find significantly improved performance for circuits built from parallel cnots, and for circuits which implement the multi-controlled $X$ gate.
arXiv Detail & Related papers (2021-10-21T21:40:43Z)
Accurate methods for the analysis of strong-drive effects in parametric gates [94.70553167084388]
We show how to efficiently extract gate parameters using exact numerics and a perturbative analytical approach. We identify optimal regimes of operation for different types of gates including $i$SWAP, controlled-Z, and CNOT.
arXiv Detail & Related papers (2021-07-06T02:02:54Z)
Minimal Filtering Algorithms for Convolutional Neural Networks [82.24592140096622]
We develop fully parallel hardware-oriented algorithms for implementing the basic filtering operation for M=3,5,7,9, and 11. A fully parallel hardware implementation of the proposed algorithms in each case gives approximately 30 percent savings in the number of embedded multipliers.
arXiv Detail & Related papers (2020-04-12T13:18:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.