Related papers: Parallel time integration using Batched BLAS (Basic Linear Algebra Subprograms) routines

Related papers

Efficient algorithms to solve atom reconfiguration problems. III. The bird and batching algorithms and other parallel implementations on GPUs [38.2058847337805]
We present efficient implementations of atom reconfiguration algorithms for both CPU and GPU. Our approach derives improved algorithms that achieve reduced time complexity and faster operational running times. These algorithms can be used to prepare defect-free configurations of neutral atoms in arrays of optical traps.
arXiv Detail & Related papers (2025-04-08T16:22:42Z)
Unlocking Real-Time Fluorescence Lifetime Imaging: Multi-Pixel Parallelism for FPGA-Accelerated Processing [2.369919866595525]
We propose a method to achieve real-time FLI using an FPGA-based hardware accelerator. We implement a GRU-based sequence-to-sequence (Seq2Seq) model on an FPGA board compatible with time-resolved cameras. By integrating a GRU-based Seq2Seq model and its compressed version, called Seq2SeqLite, we were able to process multiple pixels in parallel, reducing latency compared to sequential processing.
arXiv Detail & Related papers (2024-10-09T18:24:23Z)
Optimised Hybrid Classical-Quantum Algorithm for Accelerated Solution of Sparse Linear Systems [0.0]
This paper introduces a hybrid classical-quantum algorithm that combines preconditioning techniques with the HHL algorithm to solve sparse linear systems more efficiently. We show that the proposed approach surpasses traditional methods in speed and scalability but also mitigates some of the inherent limitations of quantum algorithms.
arXiv Detail & Related papers (2024-10-03T11:36:14Z)
Many-body computing on Field Programmable Gate Arrays [5.3808713424582395]
We leverage the capabilities of Field Programmable Gate Arrays (FPGAs) for conducting quantum many-body calculations. This has resulted in a tenfold speedup compared to CPU-based computation for a Monte Carlo algorithm. For the first time, the utilization of FPGA to accelerate a typical tensor network algorithm for many-body ground state calculations.
arXiv Detail & Related papers (2024-02-09T14:01:02Z)
Fast, Scalable, Warm-Start Semidefinite Programming with Spectral Bundling and Sketching [53.91395791840179]
We present Unified Spectral Bundling with Sketching (USBS), a provably correct, fast and scalable algorithm for solving massive SDPs. USBS provides a 500x speed-up over the state-of-the-art scalable SDP solver on an instance with over 2 billion decision variables.
arXiv Detail & Related papers (2023-12-19T02:27:22Z)
Two dimensional quantum lattice models via mode optimized hybrid CPU-GPU density matrix renormalization group method [0.0]
We present a hybrid numerical approach to simulate quantum many body problems on two spatial dimensional quantum lattice models. We demonstrate for the two dimensional spinless fermion model and for the Hubbard model on torus geometry that several orders of magnitude in computational time can be saved.
arXiv Detail & Related papers (2023-11-23T17:07:47Z)
Decreasing the Computing Time of Bayesian Optimization using Generalizable Memory Pruning [56.334116591082896]
We show a wrapper of memory pruning and bounded optimization capable of being used with any surrogate model and acquisition function. Running BO on high-dimensional or massive data sets becomes intractable due to this time complexity. All model implementations are run on the MIT Supercloud state-of-the-art computing hardware.
arXiv Detail & Related papers (2023-09-08T14:05:56Z)
Parallel hybrid quantum-classical machine learning for kernelized time-series classification [0.0]
We tackle with hybrid quantum-classical machine, deducing temporal temporal between pairwise instances using a time-series Hamiltonian (TSHK) algorithm. Because we treat the kernel weighting step as a differentiable differentiable kernel function, our method can be regarded as an end learnable hybrid quantum-series techniques.
arXiv Detail & Related papers (2023-05-10T04:01:15Z)
GPU-Accelerated Machine Learning in Non-Orthogonal Multiple Access [71.58925117604039]
Non-orthogonal multiple access (NOMA) is an interesting technology that enables massive connectivity as required in future 5G and 6G networks. We propose a neural network architecture that combines the advantages of both linear and non-linear processing.
arXiv Detail & Related papers (2022-06-13T09:38:23Z)
Efficient GPU implementation of randomized SVD and its applications [17.71779625877989]
Matrix decompositions are ubiquitous in machine learning, including applications in dimensionality data compression and deep learning algorithms. Typical solutions for matrix decompositions have complexity which significantly increases their computational cost and time. We leverage efficient processing operations that can be run in parallel on modern Graphical Processing Units (GPUs) to reduce the computational burden of computing matrix decompositions.
arXiv Detail & Related papers (2021-10-05T07:42:41Z)
Providing Meaningful Data Summarizations Using Examplar-based Clustering in Industry 4.0 [67.80123919697971]
We show, that our GPU implementation provides speedups of up to 72x using single-precision and up to 452x using half-precision compared to conventional CPU algorithms. We apply our algorithm to real-world data from injection molding manufacturing processes and discuss how found summaries help with steering this specific process to cut costs and reduce the manufacturing of bad parts.
arXiv Detail & Related papers (2021-05-25T15:55:14Z)
Photonic co-processors in HPC: using LightOn OPUs for Randomized Numerical Linear Algebra [53.13961454500934]
We show that the randomization step for dimensionality reduction may itself become the computational bottleneck on traditional hardware. We show that randomization can be significantly accelerated, at negligible precision loss, in a wide range of important RandNLA algorithms.
arXiv Detail & Related papers (2021-04-29T15:48:52Z)
Kernel methods through the roof: handling billions of points efficiently [94.31450736250918]
Kernel methods provide an elegant and principled approach to nonparametric learning, but so far could hardly be used in large scale problems. Recent advances have shown the benefits of a number of algorithmic ideas, for example combining optimization, numerical linear algebra and random projections. Here, we push these efforts further to develop and test a solver that takes full advantage of GPU hardware.
arXiv Detail & Related papers (2020-06-18T08:16:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.