Related papers: Accelerating local laplacian filters on FPGAs

Accelerating local laplacian filters on FPGAs

URL: http://arxiv.org/abs/2402.12407v1
Date: Sun, 18 Feb 2024 10:49:23 GMT
Title: Accelerating local laplacian filters on FPGAs
Authors: Shashwat Khandelwal, Ziaul Choudhury, Shashwat Shrivastava and Suresh Purini
Abstract summary: Local Laplacian Filtering is an edge-aware image processing technique that involves the construction of simple Gaussian and Laplacian pyramids. This paper proposes a hardware accelerator, which exploits fully the available parallelism in the Local Laplacian Filtering algorithm. On Virtex-7 FPGA, we obtain a 7.5x speed-up to process a 1 MB image when compared to an optimized baseline CPU implementation.
Score: 11.061707876645764
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Images when processed using various enhancement techniques often lead to edge degradation and other unwanted artifacts such as halos. These artifacts pose a major problem for photographic applications where they can denude the quality of an image. There is a plethora of edge-aware techniques proposed in the field of image processing. However, these require the application of complex optimization or post-processing methods. Local Laplacian Filtering is an edge-aware image processing technique that involves the construction of simple Gaussian and Laplacian pyramids. This technique can be successfully applied for detail smoothing, detail enhancement, tone mapping and inverse tone mapping of an image while keeping it artifact-free. The problem though with this approach is that it is computationally expensive. Hence, parallelization schemes using multi-core CPUs and GPUs have been proposed. As is well known, they are not power-efficient, and a well-designed hardware architecture on an FPGA can do better on the performance per watt metric. In this paper, we propose a hardware accelerator, which exploits fully the available parallelism in the Local Laplacian Filtering algorithm, while minimizing the utilization of on-chip FPGA resources. On Virtex-7 FPGA, we obtain a 7.5x speed-up to process a 1 MB image when compared to an optimized baseline CPU implementation. To the best of our knowledge, we are not aware of any other hardware accelerators proposed in the research literature for the Local Laplacian Filtering problem.

Related papers

HAPM -- Hardware Aware Pruning Method for CNN hardware accelerators in resource constrained devices [44.99833362998488]
The present work proposes a generic hardware architecture ready to be implemented on FPGA devices. The inference speed of the design is evaluated over different resource constrained FPGA devices. We demonstrate that our hardware-aware pruning algorithm achieves a remarkable improvement of a 45 % in inference time compared to a network pruned using the standard algorithm.
arXiv Detail & Related papers (2024-08-26T07:27:12Z)
INR-Arch: A Dataflow Architecture and Compiler for Arbitrary-Order Gradient Computations in Implicit Neural Representation Processing [66.00729477511219]
Given a function represented as a computation graph, traditional architectures face challenges in efficiently computing its nth-order gradient. We introduce INR-Arch, a framework that transforms the computation graph of an nth-order gradient into a hardware-optimized dataflow architecture. We present results that demonstrate 1.8-4.8x and 1.5-3.6x speedup compared to CPU and GPU baselines respectively.
arXiv Detail & Related papers (2023-08-11T04:24:39Z)
CoordFill: Efficient High-Resolution Image Inpainting via Parameterized Coordinate Querying [52.91778151771145]
In this paper, we try to break the limitations for the first time thanks to the recent development of continuous implicit representation. Experiments show that the proposed method achieves real-time performance on the 2048$times$2048 images using a single GTX 2080 Ti GPU.
arXiv Detail & Related papers (2023-03-15T11:13:51Z)
{\mu}Split: efficient image decomposition for microscopy data [50.794670705085835]
muSplit is a dedicated approach for trained image decomposition in the context of fluorescence microscopy images. We introduce lateral contextualization (LC), a novel meta-architecture that enables the memory efficient incorporation of large image-context. We apply muSplit to five decomposition tasks, one on a synthetic dataset, four others derived from real microscopy data.
arXiv Detail & Related papers (2022-11-23T11:26:24Z)
Efficient Image Denoising by Low-Rank Singular Vector Approximations of Geodesics' Gramian Matrix [2.3499129784547654]
Noise contamination of images results in substandard expectations among the people. Image denoising is an essential pre-processing step. We present a manifold-based noise filtering method that mainly exploits a few prominent singular vectors of the geodesics' Gramian matrix.
arXiv Detail & Related papers (2022-09-27T01:03:36Z)
Towards making the most of NLP-based device mapping optimization for OpenCL kernels [5.6596607119831575]
We extend the work of Cummins et al., namely Deeptune, that tackles the problem of optimal device selection ( CPU or GPU) for accelerated OpenCL kernels. We propose four different models that provide enhanced contextual information of source codes. Experimental results show that our proposed methodology surpasses that of Cummins et al. work, providing up to 4% improvement in prediction accuracy.
arXiv Detail & Related papers (2022-08-30T10:20:55Z)
Batch-efficient EigenDecomposition for Small and Medium Matrices [65.67315418971688]
EigenDecomposition (ED) is at the heart of many computer vision algorithms and applications. We propose a QR-based ED method dedicated to the application scenarios of computer vision.
arXiv Detail & Related papers (2022-07-09T09:14:12Z)
Hardware architecture for high throughput event visual data filtering with matrix of IIR filters algorithm [0.0]
Neuromorphic vision is a rapidly growing field with numerous applications in the perception systems of autonomous vehicles. There is a significant amount of noise in the event stream due to the sensors working principle. We present a novel algorithm based on an IIR filter matrix for filtering this type of noise and a hardware architecture that allows its acceleration.
arXiv Detail & Related papers (2022-07-02T15:18:53Z)
A modular software framework for the design and implementation of ptychography algorithms [55.41644538483948]
We present SciCom, a new ptychography software framework aiming at simulating ptychography datasets and testing state-of-the-art reconstruction algorithms. Despite its simplicity, the software leverages accelerated processing through the PyTorch interface. Results are shown on both synthetic and real datasets.
arXiv Detail & Related papers (2022-05-06T16:32:37Z)
Faster than FAST: GPU-Accelerated Frontend for High-Speed VIO [46.20949184826173]
This work focuses on the applicability of efficient low-level, GPU hardware-specific instructions to improve on existing computer vision algorithms. Especially non-maxima suppression and the subsequent feature selection are prominent contributors to the overall image processing latency.
arXiv Detail & Related papers (2020-03-30T14:16:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.