Machine learning-driven conservative-to-primitive conversion in hybrid piecewise polytropic and tabulated equations of state
- URL: http://arxiv.org/abs/2412.07836v2
- Date: Wed, 29 Jan 2025 19:00:04 GMT
- Title: Machine learning-driven conservative-to-primitive conversion in hybrid piecewise polytropic and tabulated equations of state
- Authors: Semih Kacmaz, Roland Haas, E. A. Huerta,
- Abstract summary: We present a novel machine learning (ML) method to accelerate conservative-to-primitive inversion in hydrodynamics simulations.
We employ feedforward neural networks (NNC2PS and NNC2PL), trained in PyTorch, and optimized for GPU inference using NVIDIART.
The mixed-precisionRT engine for NNC2PS inference speeds approximately 400 times faster than a traditional single-threaded implementation for a dataset size of 1,000,000 points.
- Score: 0.2999888908665658
- License:
- Abstract: We present a novel machine learning (ML) method to accelerate conservative-to-primitive inversion, focusing on hybrid piecewise polytropic and tabulated equations of state. Traditional root-finding techniques are computationally expensive, particularly for large-scale relativistic hydrodynamics simulations. To address this, we employ feedforward neural networks (NNC2PS and NNC2PL), trained in PyTorch and optimized for GPU inference using NVIDIA TensorRT, achieving significant speedups with minimal accuracy loss. The NNC2PS model achieves $ L_1 $ and $ L_\infty $ errors of $ 4.54 \times 10^{-7} $ and $ 3.44 \times 10^{-6} $, respectively, while the NNC2PL model exhibits even lower error values. TensorRT optimization with mixed-precision deployment substantially accelerates performance compared to traditional root-finding methods. Specifically, the mixed-precision TensorRT engine for NNC2PS achieves inference speeds approximately 400 times faster than a traditional single-threaded CPU implementation for a dataset size of 1,000,000 points. Ideal parallelization across an entire compute node in the Delta supercomputer (Dual AMD 64 core 2.45 GHz Milan processors; and 8 NVIDIA A100 GPUs with 40 GB HBM2 RAM and NVLink) predicts a 25-fold speedup for TensorRT over an optimally-parallelized numerical method when processing 8 million data points. Moreover, the ML method exhibits sub-linear scaling with increasing dataset sizes. We release the scientific software developed, enabling further validation and extension of our findings. This work underscores the potential of ML, combined with GPU optimization and model quantization, to accelerate conservative-to-primitive inversion in relativistic hydrodynamics simulations.
Related papers
- Compilation of Trotter-Based Time Evolution for Partially Fault-Tolerant Quantum Computing Architecture [0.6449786007855248]
We present an efficient method for simulating the time evolution of the 2D Hubbard model Hamiltonian.
Our analysis reveals an acceleration of over 10 times compared to naive serial compilation.
For devices with a physical error rate of $p_rm phys = 10-4$, we estimate that approximately $6.5 times 104$ physical qubits are required to achieve faster ground state energy estimation.
arXiv Detail & Related papers (2024-08-27T10:07:34Z) - Transformer neural networks and quantum simulators: a hybrid approach for simulating strongly correlated systems [1.6494451064539348]
We present a hybrid optimization scheme for neural quantum states (NQS) that involves a data-driven pretraining with numerical or experimental data and a second, Hamiltonian-driven optimization stage.
Our work paves the way for a reliable and efficient optimization of neural quantum states.
arXiv Detail & Related papers (2024-05-31T17:55:27Z) - Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment [56.44025052765861]
Large language models (LLMs) have revolutionized Natural Language Processing (NLP), but their size creates computational bottlenecks.
We introduce a novel approach to create accurate, sparse foundational versions of performant LLMs.
We show a total speedup on CPUs for sparse-quantized LLaMA models of up to 8.6x.
arXiv Detail & Related papers (2024-05-06T16:03:32Z) - Boosting the effective performance of massively parallel tensor network
state algorithms on hybrid CPU-GPU based architectures via non-Abelian
symmetries [0.0]
Non-Abelian symmetry related tensor algebra based on Wigner-Eckhart theorem is fully detached from the conventional tensor network layer.
We have achieved an order of magnitude increase in performance with respect to results reported in arXiv:2305.05581 in terms of computational complexity.
Our solution has an estimated effective performance of 250-500 TFLOPS.
arXiv Detail & Related papers (2023-09-23T07:49:53Z) - Geometry-Informed Neural Operator for Large-Scale 3D PDEs [76.06115572844882]
We propose the geometry-informed neural operator (GINO) to learn the solution operator of large-scale partial differential equations.
We successfully trained GINO to predict the pressure on car surfaces using only five hundred data points.
arXiv Detail & Related papers (2023-09-01T16:59:21Z) - Tensor Slicing and Optimization for Multicore NPUs [2.670309629218727]
This paper proposes a compiler optimization pass for Multicore NPUs, called Slicing Optimization (TSO)
TSO identifies the best tensor slicing that minimizes execution time for a set of CNN models.
Results show that TSO is capable of identifying the best tensor slicing that minimizes execution time for a set of CNN models.
arXiv Detail & Related papers (2023-04-06T12:03:03Z) - Efficient Dataset Distillation Using Random Feature Approximation [109.07737733329019]
We propose a novel algorithm that uses a random feature approximation (RFA) of the Neural Network Gaussian Process (NNGP) kernel.
Our algorithm provides at least a 100-fold speedup over KIP and can run on a single GPU.
Our new method, termed an RFA Distillation (RFAD), performs competitively with KIP and other dataset condensation algorithms in accuracy over a range of large-scale datasets.
arXiv Detail & Related papers (2022-10-21T15:56:13Z) - 8-bit Optimizers via Block-wise Quantization [57.25800395197516]
Statefuls maintain statistics over time, e.g., the exponentially smoothed sum (SGD with momentum) or squared sum (Adam) of past values.
This state can be used to accelerate optimization compared to plain gradient descent but uses memory that might otherwise be allocated to model parameters.
In this paper, we develop first gradients that use 8-bit statistics while maintaining the performance levels of using 32-bit gradient states.
arXiv Detail & Related papers (2021-10-06T15:43:20Z) - FastFlowNet: A Lightweight Network for Fast Optical Flow Estimation [81.76975488010213]
Dense optical flow estimation plays a key role in many robotic vision tasks.
Current networks often occupy large number of parameters and require heavy computation costs.
Our proposed FastFlowNet works in the well-known coarse-to-fine manner with following innovations.
arXiv Detail & Related papers (2021-03-08T03:09:37Z) - CSM-NN: Current Source Model Based Logic Circuit Simulation -- A Neural
Network Approach [5.365198933008246]
CSM-NN is a scalable simulation framework with optimized neural network structures and processing algorithms.
Experiments show that CSM-NN reduces the simulation time by up to $6times$ compared to a state-of-the-art current source model based simulator running on a CPU.
CSM-NN also provides high accuracy levels, with less than $2%$ error, compared to HSPICE.
arXiv Detail & Related papers (2020-02-13T00:29:44Z) - Accelerating Feedforward Computation via Parallel Nonlinear Equation
Solving [106.63673243937492]
Feedforward computation, such as evaluating a neural network or sampling from an autoregressive model, is ubiquitous in machine learning.
We frame the task of feedforward computation as solving a system of nonlinear equations. We then propose to find the solution using a Jacobi or Gauss-Seidel fixed-point method, as well as hybrid methods of both.
Our method is guaranteed to give exactly the same values as the original feedforward computation with a reduced (or equal) number of parallelizable iterations, and hence reduced time given sufficient parallel computing power.
arXiv Detail & Related papers (2020-02-10T10:11:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.