Operator Learning Using Weak Supervision from Walk-on-Spheres
- URL: http://arxiv.org/abs/2603.01193v2
- Date: Tue, 03 Mar 2026 18:07:51 GMT
- Title: Operator Learning Using Weak Supervision from Walk-on-Spheres
- Authors: Hrishikesh Viswanath, Hong Chul Nam, Xi Deng, Julius Berner, Anima Anandkumar, Aniket Bera,
- Abstract summary: Training neural PDE solvers is often bottlenecked by expensive data generation or unstable physics-informed neural network (PINN)<n>We propose an alternative approach using Monte Carlo approaches to estimate the solution to the PDE as a process for weak supervision during training.
- Score: 81.26322147849918
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Training neural PDE solvers is often bottlenecked by expensive data generation or unstable physics-informed neural network (PINN) involving challenging optimization landscapes due to higher-order derivatives. To tackle this issue, we propose an alternative approach using Monte Carlo approaches to estimate the solution to the PDE as a stochastic process for weak supervision during training. Leveraging the Walk-on-Spheres method, we introduce a learning scheme called \emph{Walk-on-Spheres Neural Operator (WoS-NO)} which uses weak supervision from WoS to train any given neural operator. We propose to amortize the cost of Monte Carlo walks across the distribution of PDE instances using stochastic representations from the WoS algorithm to generate cheap, noisy, estimates of the PDE solution during training. This is formulated into a data-free physics-informed objective where a neural operator is trained to regress against these weak supervisions, allowing the operator to learn a generalized solution map for an entire family of PDEs. This strategy does not require expensive pre-computed datasets, avoids computing higher-order derivatives for loss functions that are memory-intensive and unstable, and demonstrates zero-shot generalization to novel PDE parameters and domains. Experiments show that for the same number of training steps, our method exhibits up to 8.75$\times$ improvement in $L_2$-error compared to standard physics-informed training schemes, up to 6.31$\times$ improvement in training speed, and reductions of up to 2.97$\times$ in GPU memory consumption. We present the code at https://github.com/neuraloperator/WoS-NO
Related papers
- VAE-DNN: Energy-Efficient Trainable-by-Parts Surrogate Model For Parametric Partial Differential Equations [49.1574468325115]
We propose a trainable-by-parts surrogate model for solving forward and inverse parameterized nonlinear partial differential equations.<n>The proposed approach employs an encoder to reduce the high-dimensional input $y(bmx)$ to a lower-dimensional latent space, $bmmu_bmphi_y$.<n>A fully connected neural network is used to map $bmmu_bmphi_y$ to the latent space, $bmmu_bmphi_h$, of the P
arXiv Detail & Related papers (2025-08-05T18:37:32Z) - Optimization and generalization analysis for two-layer physics-informed neural networks without over-parametrization [0.6215404942415159]
This work focuses on the behavior of gradient descent (SGD) in solving least-squares regression with physics-informed neural networks (PINNs)<n>We show that if the network width exceeds a threshold that depends only on $epsilon$ and the problem, then the training loss and expected loss will decrease below $O(epsilon)$.
arXiv Detail & Related papers (2025-07-22T09:24:22Z) - TensorGRaD: Tensor Gradient Robust Decomposition for Memory-Efficient Neural Operator Training [91.8932638236073]
We introduce textbfTensorGRaD, a novel method that directly addresses the memory challenges associated with large-structured weights.<n>We show that sparseGRaD reduces total memory usage by over $50%$ while maintaining and sometimes even improving accuracy.
arXiv Detail & Related papers (2025-01-04T20:51:51Z) - Pretraining a Neural Operator in Lower Dimensions [7.136205674624813]
We aim to Pretrain neural PDE solvers on Lower Dimensional PDEs (PreLowD) where data collection is the least expensive.
We evaluate the effectiveness of this pretraining strategy in similar PDEs in higher dimensions.
Our work sheds light on the effect of the fine-tuning configuration to make the most of this pretraining strategy.
arXiv Detail & Related papers (2024-07-24T20:06:12Z) - Data-Efficient Operator Learning via Unsupervised Pretraining and In-Context Learning [45.78096783448304]
In this work, seeking data efficiency, we design unsupervised pretraining for PDE operator learning.<n>We mine unlabeled PDE data without simulated solutions, and we pretrain neural operators with physics-inspired reconstruction-based proxy tasks.<n>Our method is highly data-efficient, more generalizable, and even outperforms conventional vision-pretrained models.
arXiv Detail & Related papers (2024-02-24T06:27:33Z) - Monte Carlo Neural PDE Solver for Learning PDEs via Probabilistic Representation [59.45669299295436]
We propose a Monte Carlo PDE solver for training unsupervised neural solvers.<n>We use the PDEs' probabilistic representation, which regards macroscopic phenomena as ensembles of random particles.<n>Our experiments on convection-diffusion, Allen-Cahn, and Navier-Stokes equations demonstrate significant improvements in accuracy and efficiency.
arXiv Detail & Related papers (2023-02-10T08:05:19Z) - Provably Efficient Offline Reinforcement Learning with Trajectory-Wise
Reward [66.81579829897392]
We propose a novel offline reinforcement learning algorithm called Pessimistic vAlue iteRaTion with rEward Decomposition (PARTED)
PARTED decomposes the trajectory return into per-step proxy rewards via least-squares-based reward redistribution, and then performs pessimistic value based on the learned proxy reward.
To the best of our knowledge, PARTED is the first offline RL algorithm that is provably efficient in general MDP with trajectory-wise reward.
arXiv Detail & Related papers (2022-06-13T19:11:22Z) - Deep learning for inverse problems with unknown operator [0.0]
In inverse problems where the forward operator $T$ is unknown, we have access to training data consisting of functions $f_i$ and their noisy images $Tf_i$.
We propose a new method that requires minimal assumptions on the data, and prove reconstruction rates that depend on the number of training points and the noise level.
arXiv Detail & Related papers (2021-08-05T17:21:12Z) - RNN Training along Locally Optimal Trajectories via Frank-Wolfe
Algorithm [50.76576946099215]
We propose a novel and efficient training method for RNNs by iteratively seeking a local minima on the loss surface within a small region.
We develop a novel RNN training method that, surprisingly, even with the additional cost, the overall training cost is empirically observed to be lower than back-propagation.
arXiv Detail & Related papers (2020-10-12T01:59:18Z) - Predicting Training Time Without Training [120.92623395389255]
We tackle the problem of predicting the number of optimization steps that a pre-trained deep network needs to converge to a given value of the loss function.
We leverage the fact that the training dynamics of a deep network during fine-tuning are well approximated by those of a linearized model.
We are able to predict the time it takes to fine-tune a model to a given loss without having to perform any training.
arXiv Detail & Related papers (2020-08-28T04:29:54Z) - Large-time asymptotics in deep learning [0.0]
We consider the impact of the final time $T$ (which may indicate the depth of a corresponding ResNet) in training.
For the classical $L2$--regularized empirical risk minimization problem, we show that the training error is at most of the order $mathcalOleft(frac1Tright)$.
In the setting of $ellp$--distance losses, we prove that both the training error and the optimal parameters are at most of the order $mathcalOleft(e-mu
arXiv Detail & Related papers (2020-08-06T07:33:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.