Related papers: Separable Operator Networks

Separable Operator Networks

URL: http://arxiv.org/abs/2407.11253v3
Date: Thu, 05 Dec 2024 22:12:02 GMT
Title: Separable Operator Networks
Authors: Xinling Yu, Sean Hooten, Ziyue Liu, Yequan Zhao, Marco Fiorentino, Thomas Van Vaerenbergh, Zheng Zhang,
Abstract summary: Operator learning has become a powerful tool in machine learning for modeling complex physical systems governed by partial differential equations (PDEs)<n>We introduce Separable Operator Networks (SepONet), a novel framework that significantly enhances the efficiency of physics-informed operator learning.<n>SepONet uses independent trunk networks to learn basis functions separately for different coordinate axes, enabling faster and more memory-efficient training.
Score: 4.688862638563124
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Operator learning has become a powerful tool in machine learning for modeling complex physical systems governed by partial differential equations (PDEs). Although Deep Operator Networks (DeepONet) show promise, they require extensive data acquisition. Physics-informed DeepONets (PI-DeepONet) mitigate data scarcity but suffer from inefficient training processes. We introduce Separable Operator Networks (SepONet), a novel framework that significantly enhances the efficiency of physics-informed operator learning. SepONet uses independent trunk networks to learn basis functions separately for different coordinate axes, enabling faster and more memory-efficient training via forward-mode automatic differentiation. We provide a universal approximation theorem for SepONet proving the existence of a separable approximation to any nonlinear continuous operator. Then, we comprehensively benchmark its representational capacity and computational performance against PI-DeepONet. Our results demonstrate SepONet's superior performance across various nonlinear and inseparable PDEs, with SepONet's advantages increasing with problem complexity, dimension, and scale. For 1D time-dependent PDEs, SepONet achieves up to 112x faster training and 82x reduction in GPU memory usage compared to PI-DeepONet, while maintaining comparable accuracy. For the 2D time-dependent nonlinear diffusion equation, SepONet efficiently handles the complexity, achieving a 6.44% mean relative $\ell_{2}$ test error, while PI-DeepONet fails due to memory constraints. This work paves the way for extreme-scale learning of continuous mappings between infinite-dimensional function spaces. Open source code is available at \url{https://github.com/HewlettPackard/separable-operator-networks}.

Related papers

DeepONet Augmented by Randomized Neural Networks for Efficient Operator Learning in PDEs [5.84093922354671]
We propose RaNN-DeepONets, a hybrid architecture designed to balance accuracy and efficiency. RaNN-DeepONets achieves comparable accuracy while reducing computational costs by orders of magnitude. These results highlight the potential of RaNN-DeepONets as an efficient alternative for operator learning in PDE-based systems.
arXiv Detail & Related papers (2025-03-01T03:05:29Z)
Physics-Informed Latent Neural Operator for Real-time Predictions of Complex Physical Systems [0.0]
Deep operator network (DeepONet) has shown great promise as a surrogate model for systems governed by partial differential equations (PDEs) This work introduces PI-Latent-NO, a physics-informed latent operator learning framework that overcomes limitations.
arXiv Detail & Related papers (2025-01-14T20:38:30Z)
FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression [55.992528247880685]
Decentralized training faces significant challenges regarding system design and efficiency. We present FusionLLM, a decentralized training system designed and implemented for training large deep neural networks (DNNs) We show that our system and method can achieve 1.45 - 9.39x speedup compared to baseline methods while ensuring convergence.
arXiv Detail & Related papers (2024-10-16T16:13:19Z)
Zero Coordinate Shift: Whetted Automatic Differentiation for Physics-informed Operator Learning [1.024113475677323]
We present a novel and lightweight algorithm to conduct automatic differentiation (AD) for physics-informed operator learning. Instead of making all sampled coordinates as leaf variables, ZCS introduces only one scalar-valued leaf variable for each spatial or temporal dimension. It has led to an outstanding performance leap by avoiding the duplication of the computational graph along the dimension of functions.
arXiv Detail & Related papers (2023-11-01T21:28:24Z)
Training Deep Surrogate Models with Large Scale Online Learning [48.7576911714538]
Deep learning algorithms have emerged as a viable alternative for obtaining fast solutions for PDEs. Models are usually trained on synthetic data generated by solvers, stored on disk and read back for training. It proposes an open source online training framework for deep surrogate models.
arXiv Detail & Related papers (2023-06-28T12:02:27Z)
Robust Learning with Progressive Data Expansion Against Spurious Correlation [65.83104529677234]
We study the learning process of a two-layer nonlinear convolutional neural network in the presence of spurious features. Our analysis suggests that imbalanced data groups and easily learnable spurious features can lead to the dominance of spurious features during the learning process. We propose a new training algorithm called PDE that efficiently enhances the model's robustness for a better worst-group performance.
arXiv Detail & Related papers (2023-06-08T05:44:06Z)
Solving High-Dimensional PDEs with Latent Spectral Models [74.1011309005488]
We present Latent Spectral Models (LSM) toward an efficient and precise solver for high-dimensional PDEs. Inspired by classical spectral methods in numerical analysis, we design a neural spectral block to solve PDEs in the latent space. LSM achieves consistent state-of-the-art and yields a relative gain of 11.5% averaged on seven benchmarks.
arXiv Detail & Related papers (2023-01-30T04:58:40Z)
Communication-Efficient Adam-Type Algorithms for Distributed Data Mining [93.50424502011626]
We propose a class of novel distributed Adam-type algorithms (emphi.e., SketchedAMSGrad) utilizing sketching. Our new algorithm achieves a fast convergence rate of $O(frac1sqrtnT + frac1(k/d)2 T)$ with the communication cost of $O(k log(d))$ at each iteration.
arXiv Detail & Related papers (2022-10-14T01:42:05Z)
Enhanced DeepONet for Modeling Partial Differential Operators Considering Multiple Input Functions [5.819397109258169]
A deep network operator (DeepONet) was proposed to model the general non-linear continuous operators for partial differential equations (PDE) Existing DeepONet can only accept one input function, which limits its application. We propose new Enhanced DeepONet or EDeepONet high-level neural network structure, in which two input functions are represented by two branch sub-networks. Our numerical results on modeling two partial differential equation examples shows that the proposed enhanced DeepONet is about 7X-17X or about one order of magnitude more accurate than the fully connected neural network.
arXiv Detail & Related papers (2022-02-17T23:58:23Z)
An Adaptive Device-Edge Co-Inference Framework Based on Soft Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices. We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations. Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z)
Improved architectures and training algorithms for deep operator networks [0.0]
Operator learning techniques have emerged as a powerful tool for learning maps between infinite-dimensional Banach spaces. We analyze the training dynamics of deep operator networks (DeepONets) through the lens of Neural Tangent Kernel (NTK) theory.
arXiv Detail & Related papers (2021-10-04T18:34:41Z)
Design and Scaffolded Training of an Efficient DNN Operator for Computer Vision on the Edge [3.3767251810292955]
FuSeConv is a drop-in replacement for depthwise separable convolutions. FuSeConv factorizes convolution fully along their spatial and depth dimensions. Neural Operator Scaffolding scaffolds the training of FuSeConv by distilling knowledge from depthwise separable convolutions.
arXiv Detail & Related papers (2021-08-25T19:22:25Z)
Physics-Based Deep Learning for Fiber-Optic Communication Systems [10.630021520220653]
We propose a new machine-learning approach for fiber-optic communication systems governed by the nonlinear Schr"odinger equation (NLSE) Our main observation is that the popular split-step method (SSM) for numerically solving the NLSE has essentially the same functional form as a deep multi-layer neural network. We exploit this connection by parameterizing the SSM and viewing the linear steps as general linear functions, similar to the weight matrices in a neural network.
arXiv Detail & Related papers (2020-10-27T12:55:23Z)
Communication-Efficient Distributed Stochastic AUC Maximization with Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network. Our model requires a much less number of communication rounds and still a number of communication rounds in theory. Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z)
Joint Parameter-and-Bandwidth Allocation for Improving the Efficiency of Partitioned Edge Learning [73.82875010696849]
Machine learning algorithms are deployed at the network edge for training artificial intelligence (AI) models. This paper focuses on the novel joint design of parameter (computation load) allocation and bandwidth allocation.
arXiv Detail & Related papers (2020-03-10T05:52:15Z)
Large-Scale Gradient-Free Deep Learning with Recursive Local Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources. Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize. We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.