cubic: CUDA-accelerated 3D Bioimage Computing
- URL: http://arxiv.org/abs/2510.14143v1
- Date: Wed, 15 Oct 2025 22:22:06 GMT
- Title: cubic: CUDA-accelerated 3D Bioimage Computing
- Authors: Alexandr A. Kalinin, Anne E. Carpenter, Shantanu Singh, Matthew J. O'Meara,
- Abstract summary: We introduce cubic, an open-source Python library that augmenting widely used SciPy and scikit-image APIs with GPU-accelerated alternatives.<n> cubic's API is device-agnostic and dispatches operations to GPU when data reside on the device and otherwise executes on CPU.<n>We evaluate cubic both by benchmarking individual operations and by reproducing existing deconvolution and segmentation pipelines.
- Score: 42.83541173560835
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Quantitative analysis of multidimensional biological images is useful for understanding complex cellular phenotypes and accelerating advances in biomedical research. As modern microscopy generates ever-larger 2D and 3D datasets, existing computational approaches are increasingly limited by their scalability, efficiency, and integration with modern scientific computing workflows. Existing bioimage analysis tools often lack application programmable interfaces (APIs), do not support graphics processing unit (GPU) acceleration, lack broad 3D image processing capabilities, and/or have poor interoperability for compute-heavy workflows. Here, we introduce cubic, an open-source Python library that addresses these challenges by augmenting widely used SciPy and scikit-image APIs with GPU-accelerated alternatives from CuPy and RAPIDS cuCIM. cubic's API is device-agnostic and dispatches operations to GPU when data reside on the device and otherwise executes on CPU, seamlessly accelerating a broad range of image processing routines. This approach enables GPU acceleration of existing bioimage analysis workflows, from preprocessing to segmentation and feature extraction for 2D and 3D data. We evaluate cubic both by benchmarking individual operations and by reproducing existing deconvolution and segmentation pipelines, achieving substantial speedups while maintaining algorithmic fidelity. These advances establish a robust foundation for scalable, reproducible bioimage analysis that integrates with the broader Python scientific computing ecosystem, including other GPU-accelerated methods, enabling both interactive exploration and automated high-throughput analysis workflows. cubic is openly available at https://github$.$com/alxndrkalinin/cubic
Related papers
- GPU-Accelerated Algorithms for Graph Vector Search: Taxonomy, Empirical Study, and Research Directions [54.570944939061555]
We present a comprehensive study of GPU-accelerated graph-based vector search algorithms.<n>We establish a detailed taxonomy of GPU optimization strategies and clarify the mapping between algorithmic tasks and hardware execution units.<n>Our findings offer clear guidelines for designing scalable and robust GPU-powered approximate nearest neighbor search systems.
arXiv Detail & Related papers (2026-02-10T16:18:04Z) - Advancing Annotat3D with Harpia: A CUDA-Accelerated Library For Large-Scale Volumetric Data Segmentation [0.1499944454332829]
This work introduces new capabilities to Annotat3D through Harpia.<n>The library is designed to support scalable, interactive segmentation for large 3D datasets in high-performance computing.<n>The system's interactive, human-in-the-loop interface, combined with efficient GPU resource management, makes it particularly suitable for collaborative scientific imaging.
arXiv Detail & Related papers (2025-11-14T21:45:02Z) - A Parallel CPU-GPU Framework for Cost-Bounded DFS with Applications to IDA* and BTS [13.186524200050957]
We introduce a method for GPU computations in depth first search.<n>This is used to create algorithms like emphsynchronous IDA*, an extension of the Iterative Deepening A* (IDA*) algorithm.<n>We evaluate the approach on the 3x3's Rubik Cube and 4x4 sliding tile puzzle (STP), showing that GPU operations can be efficiently batched in DFS.
arXiv Detail & Related papers (2025-07-16T05:07:33Z) - Deep Optimizer States: Towards Scalable Training of Transformer Models Using Interleaved Offloading [2.8231000588510757]
Transformers and large language models(LLMs) have seen rapid adoption in all domains.
Training of transformers is very expensive and often hits a memory wall''
We propose a novel technique to split the LLM into subgroups, whose update phase is scheduled on either the CPU or the GPU.
arXiv Detail & Related papers (2024-10-26T00:43:59Z) - Batch-efficient EigenDecomposition for Small and Medium Matrices [65.67315418971688]
EigenDecomposition (ED) is at the heart of many computer vision algorithms and applications.
We propose a QR-based ED method dedicated to the application scenarios of computer vision.
arXiv Detail & Related papers (2022-07-09T09:14:12Z) - Mesh Convolution with Continuous Filters for 3D Surface Parsing [101.25796935464648]
We propose a series of modular operations for effective geometric feature learning from 3D triangle meshes.
Our mesh convolutions exploit spherical harmonics as orthonormal bases to create continuous convolutional filters.
We further contribute a novel hierarchical neural network for perceptual parsing of 3D surfaces, named PicassoNet++.
arXiv Detail & Related papers (2021-12-03T09:16:49Z) - Distributing Deep Learning Hyperparameter Tuning for 3D Medical Image
Segmentation [5.652813393326783]
Most research on novel techniques for 3D Medical Image (MIS) is currently done using Deep Learning with GPU accelerators.
The principal challenge of such technique is that a single input can easily cope computing resources, and require prohibitive amounts of time to be processed.
We present a design for distributed deep learning training pipelines, focusing on multi-node and multi- GPU environments.
arXiv Detail & Related papers (2021-10-29T16:11:25Z) - Efficient GPU implementation of randomized SVD and its applications [17.71779625877989]
Matrix decompositions are ubiquitous in machine learning, including applications in dimensionality data compression and deep learning algorithms.
Typical solutions for matrix decompositions have complexity which significantly increases their computational cost and time.
We leverage efficient processing operations that can be run in parallel on modern Graphical Processing Units (GPUs) to reduce the computational burden of computing matrix decompositions.
arXiv Detail & Related papers (2021-10-05T07:42:41Z) - Providing Meaningful Data Summarizations Using Examplar-based Clustering
in Industry 4.0 [67.80123919697971]
We show, that our GPU implementation provides speedups of up to 72x using single-precision and up to 452x using half-precision compared to conventional CPU algorithms.
We apply our algorithm to real-world data from injection molding manufacturing processes and discuss how found summaries help with steering this specific process to cut costs and reduce the manufacturing of bad parts.
arXiv Detail & Related papers (2021-05-25T15:55:14Z) - Kernel methods through the roof: handling billions of points efficiently [94.31450736250918]
Kernel methods provide an elegant and principled approach to nonparametric learning, but so far could hardly be used in large scale problems.
Recent advances have shown the benefits of a number of algorithmic ideas, for example combining optimization, numerical linear algebra and random projections.
Here, we push these efforts further to develop and test a solver that takes full advantage of GPU hardware.
arXiv Detail & Related papers (2020-06-18T08:16:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.