Fast and Scalable Computation of the Forward and Inverse Discrete
Periodic Radon Transform
- URL: http://arxiv.org/abs/2112.13149v1
- Date: Fri, 24 Dec 2021 22:33:13 GMT
- Title: Fast and Scalable Computation of the Forward and Inverse Discrete
Periodic Radon Transform
- Authors: Cesar Carranza, Daniel Llamocca, and Marios Pattichis
- Abstract summary: The Discrete Periodic Radon Transform (DPRT) has been extensively used in applications that involve image reconstructions from projections.
This manuscript introduces a fast and scalable approach for computing the forward and inverse DPRT.
- Score: 2.2940141855172027
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The Discrete Periodic Radon Transform (DPRT) has been extensively used in
applications that involve image reconstructions from projections. This
manuscript introduces a fast and scalable approach for computing the forward
and inverse DPRT that is based on the use of: (i) a parallel array of
fixed-point adder trees, (ii) circular shift registers to remove the need for
accessing external memory components when selecting the input data for the
adder trees, (iii) an image block-based approach to DPRT computation that can
fit the proposed architecture to available resources, and (iv) fast
transpositions that are computed in one or a few clock cycles that do not
depend on the size of the input image. As a result, for an $N\times N$ image
($N$ prime), the proposed approach can compute up to $N^{2}$ additions per
clock cycle. Compared to previous approaches, the scalable approach provides
the fastest known implementations for different amounts of computational
resources. For example, for a $251\times 251$ image, for approximately $25\%$
fewer flip-flops than required for a systolic implementation, we have that the
scalable DPRT is computed 36 times faster. For the fastest case, we introduce
optimized architectures that can compute the DPRT and its inverse in just
$2N+\left\lceil \log_{2}N\right\rceil+1$ and $2N+3\left\lceil
\log_{2}N\right\rceil+B+2$ cycles respectively, where $B$ is the number of bits
used to represent each input pixel. On the other hand, the scalable DPRT
approach requires more 1-bit additions than for the systolic implementation and
provides a trade-off between speed and additional 1-bit additions. All of the
proposed DPRT architectures were implemented in VHDL and validated using an
FPGA implementation.
Related papers
- A High-Speed Hardware Algorithm for Modulus Operation and its Application in Prime Number Calculation [0.0]
The proposed algorithm use only addition, subtraction, logical, and bit shift operations.
It addresses scalability challenges in cryptographic applications.
The application of this algorithm in prime number calculation up to 500,000 shows its practical utility and performance advantages.
arXiv Detail & Related papers (2024-07-17T13:24:52Z) - Optimal Rates for $O(1)$-Smooth DP-SCO with a Single Epoch and Large Batches [12.184984662899868]
We revisit the correlated convex optimization (SCO) problem.
We propose an algorithm that achieves the optimal rate for DP-SCO (up to polylog factors) in a single epoch.
arXiv Detail & Related papers (2024-06-04T18:59:42Z) - Video Frame Interpolation with Many-to-many Splatting and Spatial
Selective Refinement [83.60486465697318]
We propose a fully differentiable Many-to-Many (M2M) splatting framework to interpolate frames efficiently.
For each input frame pair, M2M has a minuscule computational overhead when interpolating an arbitrary number of in-between frames.
We extend an M2M++ framework by introducing a flexible Spatial Selective Refinement component, which allows for trading computational efficiency for quality and vice versa.
arXiv Detail & Related papers (2023-10-29T09:09:32Z) - INR-Arch: A Dataflow Architecture and Compiler for Arbitrary-Order
Gradient Computations in Implicit Neural Representation Processing [66.00729477511219]
Given a function represented as a computation graph, traditional architectures face challenges in efficiently computing its nth-order gradient.
We introduce INR-Arch, a framework that transforms the computation graph of an nth-order gradient into a hardware-optimized dataflow architecture.
We present results that demonstrate 1.8-4.8x and 1.5-3.6x speedup compared to CPU and GPU baselines respectively.
arXiv Detail & Related papers (2023-08-11T04:24:39Z) - Fast Computation of Optimal Transport via Entropy-Regularized Extragradient Methods [75.34939761152587]
Efficient computation of the optimal transport distance between two distributions serves as an algorithm that empowers various applications.
This paper develops a scalable first-order optimization-based method that computes optimal transport to within $varepsilon$ additive accuracy.
arXiv Detail & Related papers (2023-01-30T15:46:39Z) - Communication-Efficient Adam-Type Algorithms for Distributed Data Mining [93.50424502011626]
We propose a class of novel distributed Adam-type algorithms (emphi.e., SketchedAMSGrad) utilizing sketching.
Our new algorithm achieves a fast convergence rate of $O(frac1sqrtnT + frac1(k/d)2 T)$ with the communication cost of $O(k log(d))$ at each iteration.
arXiv Detail & Related papers (2022-10-14T01:42:05Z) - Rapid Person Re-Identification via Sub-space Consistency Regularization [51.76876061721556]
Person Re-Identification (ReID) matches pedestrians across disjoint cameras.
Existing ReID methods adopting real-value feature descriptors have achieved high accuracy, but they are low in efficiency due to the slow Euclidean distance computation.
We propose a novel Sub-space Consistency Regularization (SCR) algorithm that can speed up the ReID procedure by 0.25$ times.
arXiv Detail & Related papers (2022-07-13T02:44:05Z) - Batch-efficient EigenDecomposition for Small and Medium Matrices [65.67315418971688]
EigenDecomposition (ED) is at the heart of many computer vision algorithms and applications.
We propose a QR-based ED method dedicated to the application scenarios of computer vision.
arXiv Detail & Related papers (2022-07-09T09:14:12Z) - Fast 2D Convolutions and Cross-Correlations Using Scalable Architectures [2.2940141855172027]
The basic idea is to map 2D convolutions and cross-correlations to a collection of 1D convolutions and cross-correlations in the transform domain.
The approach uses scalable architectures that can be fitted into modern FPGA and Zynq-SOC devices.
arXiv Detail & Related papers (2021-12-24T22:34:51Z) - Accelerated FBP for computed tomography image reconstruction [1.0266928164137636]
Filtered back projection (FBP) is a commonly used technique in tomographic image reconstruction demonstrating acceptable quality.
We propose a novel approach that reduces the computational complexity of the algorithm to $Theta(N2log N)$ addition operations avoiding Fourier space.
arXiv Detail & Related papers (2020-07-13T10:16:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.