Related papers: Tensor-Compressed and Fully-Quantized Training of Neural PDE Solvers

Tensor-Compressed and Fully-Quantized Training of Neural PDE Solvers

URL: http://arxiv.org/abs/2512.09202v1
Date: Wed, 10 Dec 2025 00:00:34 GMT
Title: Tensor-Compressed and Fully-Quantized Training of Neural PDE Solvers
Authors: Jinming Lu, Jiayi Tian, Yequan Zhao, Hai Li, Zheng Zhang,
Abstract summary: We present a framework that enables scalable and energy-efficient PINN training on edge devices.<n>This work enables real-time PDE solving on edge devices and paves the way for energy-efficient scientific computing at scale.
Score: 10.320585073024455
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Physics-Informed Neural Networks (PINNs) have emerged as a promising paradigm for solving partial differential equations (PDEs) by embedding physical laws into neural network training objectives. However, their deployment on resource-constrained platforms is hindered by substantial computational and memory overhead, primarily stemming from higher-order automatic differentiation, intensive tensor operations, and reliance on full-precision arithmetic. To address these challenges, we present a framework that enables scalable and energy-efficient PINN training on edge devices. This framework integrates fully quantized training, Stein's estimator (SE)-based residual loss computation, and tensor-train (TT) decomposition for weight compression. It contributes three key innovations: (1) a mixed-precision training method that use a square-block MX (SMX) format to eliminate data duplication during backpropagation; (2) a difference-based quantization scheme for the Stein's estimator that mitigates underflow; and (3) a partial-reconstruction scheme (PRS) for TT-Layers that reduces quantization-error accumulation. We further design PINTA, a precision-scalable hardware accelerator, to fully exploit the performance of the framework. Experiments on the 2-D Poisson, 20-D Hamilton-Jacobi-Bellman (HJB), and 100-D Heat equations demonstrate that the proposed framework achieves accuracy comparable to or better than full-precision, uncompressed baselines while delivering 5.5x to 83.5x speedups and 159.6x to 2324.1x energy savings. This work enables real-time PDE solving on edge devices and paves the way for energy-efficient scientific computing at scale.

Related papers

Transolver-3: Scaling Up Transformer Solvers to Industrial-Scale Geometries [51.028432812178266]
Transolver-3 is a new member of the Transolver family designed for high-fidelity physics simulations.<n>We show that Transolver-3 is capable of handling meshes with over 160 million cells, achieving impressive performance across three challenging simulation benchmarks.
arXiv Detail & Related papers (2026-02-04T16:52:44Z)
Mixed Precision Training of Neural ODEs [1.3382837742547355]
This paper presents a mixed precision training framework for neural ODEs.<n>It combines explicit ODE solvers with a custom backpropagation scheme.<n>It achieves approximately 50% memory reduction and up to 2x speedup while maintaining accuracy comparable to single-precision training.
arXiv Detail & Related papers (2025-10-27T16:32:56Z)
PT$^2$-LLM: Post-Training Ternarization for Large Language Models [52.4629647715623]
Large Language Models (LLMs) have shown impressive capabilities across diverse tasks, but their large memory and compute demands hinder deployment.<n>We propose PT$2$-LLM, a post-training ternarization framework tailored for LLMs.<n>At its core is an Asymmetric Ternary Quantizer equipped with a two-stage refinement pipeline.
arXiv Detail & Related papers (2025-09-27T03:01:48Z)
DualSparse-MoE: Coordinating Tensor/Neuron-Level Sparsity with Expert Partition and Reconstruction [15.261077484922616]
Mixture of Experts (MoE) has become a mainstream architecture for building Large Language Models (LLMs)<n>We identify dual sparsity at the tensor and neuron levels in pre-trained MoE modules as a key factor for both accuracy and efficiency.<n>We propose DualSparse-MoE, an inference system that integrates dynamic tensor-level dropping with static neuron-level reconstruction.
arXiv Detail & Related papers (2025-08-25T18:08:32Z)
PhysicsCorrect: A Training-Free Approach for Stable Neural PDE Simulations [4.7903561901859355]
We present PhysicsCorrect, a training-free correction framework that enforces PDE consistency at each prediction step.<n>Our key innovation is an efficient caching strategy that precomputes the Jacobian and its pseudoinverse during an offline warm-up phase.<n>Across three representative PDE systems, PhysicsCorrect reduces prediction errors by up to 100x while adding negligible inference time.
arXiv Detail & Related papers (2025-07-03T01:22:57Z)
Enabling Automatic Differentiation with Mollified Graph Neural Operators [73.52999622724101]
We propose the mollified graph neural operator ($m$GNO), the first method to leverage automatic differentiation and compute exact gradients on arbitrary geometries.<n>For a PDE example on regular grids, $m$GNO paired with autograd reduced the L2 relative data error by 20x compared to finite differences.<n>It can also solve PDEs on unstructured point clouds seamlessly, using physics losses only, at resolutions vastly lower than those needed for finite differences to be accurate enough.
arXiv Detail & Related papers (2025-04-11T06:16:30Z)
QuartDepth: Post-Training Quantization for Real-Time Depth Estimation on the Edge [55.75103034526652]
We propose QuartDepth which adopts post-training quantization to quantize MDE models with hardware accelerations for ASICs.<n>Our approach involves quantizing both weights and activations to 4-bit precision, reducing the model size and computation cost.<n>We design a flexible and programmable hardware accelerator by supporting kernel fusion and customized instruction programmability.
arXiv Detail & Related papers (2025-03-20T21:03:10Z)
TensorGRaD: Tensor Gradient Robust Decomposition for Memory-Efficient Neural Operator Training [91.8932638236073]
We introduce textbfTensorGRaD, a novel method that directly addresses the memory challenges associated with large-structured weights.<n>We show that sparseGRaD reduces total memory usage by over $50%$ while maintaining and sometimes even improving accuracy.
arXiv Detail & Related papers (2025-01-04T20:51:51Z)
PLUM: Improving Inference Efficiency By Leveraging Repetition-Sparsity Trade-Off [2.326200609038491]
Quantization and sparsity are key techniques that translate to repetition and sparsity within tensors at the hardware-software interface.<n>This paper introduces the concept of repetition-sparsity trade-off that helps explain computational efficiency during inference.<n>We propose PLUM, a unified co-design framework that integrates inference systems and quantization to leverage the repetition-sparsity trade-off.
arXiv Detail & Related papers (2023-12-04T02:33:53Z)
Efficient Neural PDE-Solvers using Quantization Aware Training [71.0934372968972]
We show that quantization can successfully lower the computational cost of inference while maintaining performance. Our results on four standard PDE datasets and three network architectures show that quantization-aware training works across settings and three orders of FLOPs magnitudes.
arXiv Detail & Related papers (2023-08-14T09:21:19Z)
Activation Density based Mixed-Precision Quantization for Energy Efficient Neural Networks [2.666640112616559]
We propose an in-training quantization method for neural network models. Our method calculates bit-width for each layer during training a mixed precision model with competitive accuracy. We run experiments on benchmark datasets like CIFAR-10, CIFAR-100, TinyImagenet on VGG19/ResNet18 architectures.
arXiv Detail & Related papers (2021-01-12T09:01:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.