A Fast Parallel Median Filtering Algorithm Using Hierarchical Tiling
- URL: http://arxiv.org/abs/2507.19926v1
- Date: Sat, 26 Jul 2025 12:06:05 GMT
- Title: A Fast Parallel Median Filtering Algorithm Using Hierarchical Tiling
- Authors: Louis Sugy,
- Abstract summary: Median filtering is widely used in digital image processing to remove noise while retaining sharp edges.<n> Sorting-based algorithms excel with small kernels but scale poorly with increasing kernel diameter.<n>This paper introduces a novel algorithm, leveraging the separability of the sorting problem through hierarchical tiling to minimize redundant computations.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Median filtering is a non-linear smoothing technique widely used in digital image processing to remove noise while retaining sharp edges. It is particularly well suited to removing outliers (impulse noise) or granular artifacts (speckle noise). However, the high computational cost of median filtering can be prohibitive. Sorting-based algorithms excel with small kernels but scale poorly with increasing kernel diameter, in contrast to constant-time methods characterized by higher constant factors but better scalability, such as histogram-based approaches or the 2D wavelet matrix. This paper introduces a novel algorithm, leveraging the separability of the sorting problem through hierarchical tiling to minimize redundant computations. We propose two variants: a data-oblivious selection network that can operate entirely within registers, and a data-aware version utilizing random-access memory. These achieve per-pixel complexities of $O(k \log(k))$ and $O(k)$, respectively, for a $k \times k$ kernel - unprecedented for sorting-based methods. Our CUDA implementation is up to 5 times faster than the current state of the art on a modern GPU and is the fastest median filter in most cases for 8-, 16-, and 32-bit data types and kernels from $3 \times 3$ to $75 \times 75$.
Related papers
- Fast Isotropic Median Filtering [0.0]
Median filtering is a cornerstone of computational image processing.<n>Our method operates efficiently on arbitrary bit-depth data, arbitrary kernel sizes, and arbitrary convex kernel shapes.
arXiv Detail & Related papers (2025-05-28T23:38:21Z) - A Greedy Hierarchical Approach to Whole-Network Filter-Pruning in CNNs [2.188091591747149]
Whole-network filter pruning algorithms prune varying fractions of filters from each layer, hence providing greater flexibility.
This paper proposes a two-level hierarchical approach for whole-network filter pruning which is efficient and uses the classification loss as the final criterion.
Our method reduces the RAM requirement for ResNext101 from 7.6 GB to 1.5 GB and achieves a 94% reduction in FLOPS without losing accuracy on CIFAR-10.
arXiv Detail & Related papers (2024-08-22T03:59:57Z) - Efficient Dataset Distillation Using Random Feature Approximation [109.07737733329019]
We propose a novel algorithm that uses a random feature approximation (RFA) of the Neural Network Gaussian Process (NNGP) kernel.
Our algorithm provides at least a 100-fold speedup over KIP and can run on a single GPU.
Our new method, termed an RFA Distillation (RFAD), performs competitively with KIP and other dataset condensation algorithms in accuracy over a range of large-scale datasets.
arXiv Detail & Related papers (2022-10-21T15:56:13Z) - Communication-Efficient Adam-Type Algorithms for Distributed Data Mining [93.50424502011626]
We propose a class of novel distributed Adam-type algorithms (emphi.e., SketchedAMSGrad) utilizing sketching.
Our new algorithm achieves a fast convergence rate of $O(frac1sqrtnT + frac1(k/d)2 T)$ with the communication cost of $O(k log(d))$ at each iteration.
arXiv Detail & Related papers (2022-10-14T01:42:05Z) - Efficient Image Denoising by Low-Rank Singular Vector Approximations of Geodesics' Gramian Matrix [2.3499129784547654]
Noise contamination of images results in substandard expectations among the people.
Image denoising is an essential pre-processing step.
We present a manifold-based noise filtering method that mainly exploits a few prominent singular vectors of the geodesics' Gramian matrix.
arXiv Detail & Related papers (2022-09-27T01:03:36Z) - Batch-efficient EigenDecomposition for Small and Medium Matrices [65.67315418971688]
EigenDecomposition (ED) is at the heart of many computer vision algorithms and applications.
We propose a QR-based ED method dedicated to the application scenarios of computer vision.
arXiv Detail & Related papers (2022-07-09T09:14:12Z) - Scalable Differentially Private Clustering via Hierarchically Separated
Trees [82.69664595378869]
We show that our method computes a solution with cost at most $O(d3/2log n)cdot OPT + O(k d2 log2 n / epsilon2)$, where $epsilon$ is the privacy guarantee.
Although the worst-case guarantee is worse than that of state of the art private clustering methods, the algorithm we propose is practical.
arXiv Detail & Related papers (2022-06-17T09:24:41Z) - Gaussian Process Bandit Optimization with Few Batches [49.896920704012395]
We introduce a batch algorithm inspired by finite-arm bandit algorithms.
We show that it achieves the cumulative regret upper bound $Oast(sqrtTgamma_T)$ using $O(loglog T)$ batches within time horizon $T$.
In addition, we propose a modified version of our algorithm, and characterize how the regret is impacted by the number of batches.
arXiv Detail & Related papers (2021-10-15T00:54:04Z) - RAMA: A Rapid Multicut Algorithm on GPU [23.281726932718232]
We propose a highly parallel primal-dual algorithm for the multicut (a.k.a.magnitude correlation clustering) problem.
Our algorithm produces primal solutions and dual lower bounds that estimate the distance to optimum.
We can solve very large scale benchmark problems with up to $mathcalO(108)$ variables in a few seconds with small primal-dual gaps.
arXiv Detail & Related papers (2021-09-04T10:33:59Z) - VersaGNN: a Versatile accelerator for Graph neural networks [81.1667080640009]
We propose textitVersaGNN, an ultra-efficient, systolic-array-based versatile hardware accelerator.
textitVersaGNN achieves on average 3712$times$ speedup with 1301.25$times$ energy reduction on CPU, and 35.4$times$ speedup with 17.66$times$ energy reduction on GPU.
arXiv Detail & Related papers (2021-05-04T04:10:48Z) - Streaming Coresets for Symmetric Tensor Factorization [9.181791777532608]
We show how to factorize tensors efficiently in the streaming setting.
We introduce two novel algorithmic techniques: online filtering and kernelization.
We show applications of our algorithms in learning single topic modeling.
arXiv Detail & Related papers (2020-06-01T19:55:34Z) - Private Stochastic Convex Optimization: Optimal Rates in Linear Time [74.47681868973598]
We study the problem of minimizing the population loss given i.i.d. samples from a distribution over convex loss functions.
A recent work of Bassily et al. has established the optimal bound on the excess population loss achievable given $n$ samples.
We describe two new techniques for deriving convex optimization algorithms both achieving the optimal bound on excess loss and using $O(minn, n2/d)$ gradient computations.
arXiv Detail & Related papers (2020-05-10T19:52:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.