Accelerating local laplacian filters on FPGAs
- URL: http://arxiv.org/abs/2402.12407v1
- Date: Sun, 18 Feb 2024 10:49:23 GMT
- Title: Accelerating local laplacian filters on FPGAs
- Authors: Shashwat Khandelwal, Ziaul Choudhury, Shashwat Shrivastava and Suresh
Purini
- Abstract summary: Local Laplacian Filtering is an edge-aware image processing technique that involves the construction of simple Gaussian and Laplacian pyramids.
This paper proposes a hardware accelerator, which exploits fully the available parallelism in the Local Laplacian Filtering algorithm.
On Virtex-7 FPGA, we obtain a 7.5x speed-up to process a 1 MB image when compared to an optimized baseline CPU implementation.
- Score: 11.061707876645764
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Images when processed using various enhancement techniques often lead to edge
degradation and other unwanted artifacts such as halos. These artifacts pose a
major problem for photographic applications where they can denude the quality
of an image. There is a plethora of edge-aware techniques proposed in the field
of image processing. However, these require the application of complex
optimization or post-processing methods. Local Laplacian Filtering is an
edge-aware image processing technique that involves the construction of simple
Gaussian and Laplacian pyramids. This technique can be successfully applied for
detail smoothing, detail enhancement, tone mapping and inverse tone mapping of
an image while keeping it artifact-free. The problem though with this approach
is that it is computationally expensive. Hence, parallelization schemes using
multi-core CPUs and GPUs have been proposed. As is well known, they are not
power-efficient, and a well-designed hardware architecture on an FPGA can do
better on the performance per watt metric. In this paper, we propose a hardware
accelerator, which exploits fully the available parallelism in the Local
Laplacian Filtering algorithm, while minimizing the utilization of on-chip FPGA
resources. On Virtex-7 FPGA, we obtain a 7.5x speed-up to process a 1 MB image
when compared to an optimized baseline CPU implementation. To the best of our
knowledge, we are not aware of any other hardware accelerators proposed in the
research literature for the Local Laplacian Filtering problem.
Related papers
- HAPM -- Hardware Aware Pruning Method for CNN hardware accelerators in resource constrained devices [44.99833362998488]
The present work proposes a generic hardware architecture ready to be implemented on FPGA devices.
The inference speed of the design is evaluated over different resource constrained FPGA devices.
We demonstrate that our hardware-aware pruning algorithm achieves a remarkable improvement of a 45 % in inference time compared to a network pruned using the standard algorithm.
arXiv Detail & Related papers (2024-08-26T07:27:12Z) - INR-Arch: A Dataflow Architecture and Compiler for Arbitrary-Order
Gradient Computations in Implicit Neural Representation Processing [66.00729477511219]
Given a function represented as a computation graph, traditional architectures face challenges in efficiently computing its nth-order gradient.
We introduce INR-Arch, a framework that transforms the computation graph of an nth-order gradient into a hardware-optimized dataflow architecture.
We present results that demonstrate 1.8-4.8x and 1.5-3.6x speedup compared to CPU and GPU baselines respectively.
arXiv Detail & Related papers (2023-08-11T04:24:39Z) - CoordFill: Efficient High-Resolution Image Inpainting via Parameterized
Coordinate Querying [52.91778151771145]
In this paper, we try to break the limitations for the first time thanks to the recent development of continuous implicit representation.
Experiments show that the proposed method achieves real-time performance on the 2048$times$2048 images using a single GTX 2080 Ti GPU.
arXiv Detail & Related papers (2023-03-15T11:13:51Z) - {\mu}Split: efficient image decomposition for microscopy data [50.794670705085835]
muSplit is a dedicated approach for trained image decomposition in the context of fluorescence microscopy images.
We introduce lateral contextualization (LC), a novel meta-architecture that enables the memory efficient incorporation of large image-context.
We apply muSplit to five decomposition tasks, one on a synthetic dataset, four others derived from real microscopy data.
arXiv Detail & Related papers (2022-11-23T11:26:24Z) - Efficient Image Denoising by Low-Rank Singular Vector Approximations of Geodesics' Gramian Matrix [2.3499129784547654]
Noise contamination of images results in substandard expectations among the people.
Image denoising is an essential pre-processing step.
We present a manifold-based noise filtering method that mainly exploits a few prominent singular vectors of the geodesics' Gramian matrix.
arXiv Detail & Related papers (2022-09-27T01:03:36Z) - Towards making the most of NLP-based device mapping optimization for
OpenCL kernels [5.6596607119831575]
We extend the work of Cummins et al., namely Deeptune, that tackles the problem of optimal device selection ( CPU or GPU) for accelerated OpenCL kernels.
We propose four different models that provide enhanced contextual information of source codes.
Experimental results show that our proposed methodology surpasses that of Cummins et al. work, providing up to 4% improvement in prediction accuracy.
arXiv Detail & Related papers (2022-08-30T10:20:55Z) - Batch-efficient EigenDecomposition for Small and Medium Matrices [65.67315418971688]
EigenDecomposition (ED) is at the heart of many computer vision algorithms and applications.
We propose a QR-based ED method dedicated to the application scenarios of computer vision.
arXiv Detail & Related papers (2022-07-09T09:14:12Z) - Hardware architecture for high throughput event visual data filtering
with matrix of IIR filters algorithm [0.0]
Neuromorphic vision is a rapidly growing field with numerous applications in the perception systems of autonomous vehicles.
There is a significant amount of noise in the event stream due to the sensors working principle.
We present a novel algorithm based on an IIR filter matrix for filtering this type of noise and a hardware architecture that allows its acceleration.
arXiv Detail & Related papers (2022-07-02T15:18:53Z) - A modular software framework for the design and implementation of
ptychography algorithms [55.41644538483948]
We present SciCom, a new ptychography software framework aiming at simulating ptychography datasets and testing state-of-the-art reconstruction algorithms.
Despite its simplicity, the software leverages accelerated processing through the PyTorch interface.
Results are shown on both synthetic and real datasets.
arXiv Detail & Related papers (2022-05-06T16:32:37Z) - Faster than FAST: GPU-Accelerated Frontend for High-Speed VIO [46.20949184826173]
This work focuses on the applicability of efficient low-level, GPU hardware-specific instructions to improve on existing computer vision algorithms.
Especially non-maxima suppression and the subsequent feature selection are prominent contributors to the overall image processing latency.
arXiv Detail & Related papers (2020-03-30T14:16:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.