Related papers: Erwin: A Tree-based Hierarchical Transformer for Large-scale Physical Systems

Erwin: A Tree-based Hierarchical Transformer for Large-scale Physical Systems

URL: http://arxiv.org/abs/2502.17019v1
Date: Mon, 24 Feb 2025 10:16:55 GMT
Title: Erwin: A Tree-based Hierarchical Transformer for Large-scale Physical Systems
Authors: Maksim Zhdanov, Max Welling, Jan-Willem van de Meent,
Abstract summary: We present Erwin, a hierarchical transformer inspired by methods from computational many-body physics.<n>We demonstrate Erwin's effectiveness across multiple domains, including cosmology, molecular dynamics, and particle fluid dynamics.
Score: 48.984420422430404
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large-scale physical systems defined on irregular grids pose significant scalability challenges for deep learning methods, especially in the presence of long-range interactions and multi-scale coupling. Traditional approaches that compute all pairwise interactions, such as attention, become computationally prohibitive as they scale quadratically with the number of nodes. We present Erwin, a hierarchical transformer inspired by methods from computational many-body physics, which combines the efficiency of tree-based algorithms with the expressivity of attention mechanisms. Erwin employs ball tree partitioning to organize computation, which enables linear-time attention by processing nodes in parallel within local neighborhoods of fixed size. Through progressive coarsening and refinement of the ball tree structure, complemented by a novel cross-ball interaction mechanism, it captures both fine-grained local details and global features. We demonstrate Erwin's effectiveness across multiple domains, including cosmology, molecular dynamics, and particle fluid dynamics, where it consistently outperforms baseline methods both in accuracy and computational efficiency.

Related papers

From Local Interactions to Global Operators: Scalable Gaussian Process Operator for Physical Systems [7.807210884802377]
We introduce a novel, scalable GPO that capitalizes on sparsity, locality, and structural information through judicious kernel design.<n>We demonstrate that our framework consistently achieves high accuracy across varying discretization scales.
arXiv Detail & Related papers (2025-06-18T22:40:52Z)
Connecting the geometry and dynamics of many-body complex systems with message passing neural operators [1.8434042562191815]
We introduce a scalable AI framework, ROMA, for learning multiscale evolution operators of many-body complex systems.<n>An attention mechanism is used to model multiscale interactions by connecting geometric representations of local subgraphs and dynamical operators.<n>We demonstrate that the ROMA framework improves scalability and positive transfer between forecasting and effective dynamics tasks.
arXiv Detail & Related papers (2025-02-21T20:04:09Z)
Deep Signature: Characterization of Large-Scale Molecular Dynamics [29.67824486345836]
Deep Signature is a novel computationally tractable framework that characterizes complex dynamics and interatomic interactions.<n>Our approach incorporates soft spectral clustering that locally aggregates cooperative dynamics to reduce the size of the system, as well as signature transform to provide a global characterization of the non-smooth interactive dynamics.
arXiv Detail & Related papers (2024-10-03T16:37:48Z)
Inferring Kernel $ε$-Machines: Discovering Structure in Complex Systems [49.1574468325115]
We introduce causal diffusion components that encode the kernel causal-state estimates as a set of coordinates in a reduced dimension space. We show how each component extracts predictive features from data and demonstrate their application on four examples.
arXiv Detail & Related papers (2024-10-01T21:14:06Z)
Capturing long-range memory structures with tree-geometry process tensors [0.0]
We introduce a class of quantum non-Markovian processes that exhibit decaying temporal correlations and memory distributed across time scales. We show that the long-range correlations in this class of processes tends to originate almost entirely from memory effects. We show how it can efficiently approximate the strong memory dynamics of the paradigm spin-boson model.
arXiv Detail & Related papers (2023-12-07T19:00:01Z)
Exploring the role of parameters in variational quantum algorithms [59.20947681019466]
We introduce a quantum-control-inspired method for the characterization of variational quantum circuits using the rank of the dynamical Lie algebra. A promising connection is found between the Lie rank, the accuracy of calculated energies, and the requisite depth to attain target states via a given circuit architecture.
arXiv Detail & Related papers (2022-09-28T20:24:53Z)
Inducing Gaussian Process Networks [80.40892394020797]
We propose inducing Gaussian process networks (IGN), a simple framework for simultaneously learning the feature space as well as the inducing points. The inducing points, in particular, are learned directly in the feature space, enabling a seamless representation of complex structured domains. We report on experimental results for real-world data sets showing that IGNs provide significant advances over state-of-the-art methods.
arXiv Detail & Related papers (2022-04-21T05:27:09Z)
Poly-NL: Linear Complexity Non-local Layers with Polynomials [76.21832434001759]
We formulate novel fast NonLocal blocks, capable of reducing complexity from quadratic to linear with no loss in performance. The proposed method, which we dub as "Poly-NL", is competitive with state-of-the-art performance across image recognition, instance segmentation, and face detection tasks.
arXiv Detail & Related papers (2021-07-06T19:51:37Z)
X-volution: On the unification of convolution and self-attention [52.80459687846842]
We propose a multi-branch elementary module composed of both convolution and self-attention operation. The proposed X-volution achieves highly competitive visual understanding improvements.
arXiv Detail & Related papers (2021-06-04T04:32:02Z)
Learning Theory for Inferring Interaction Kernels in Second-Order Interacting Agent Systems [17.623937769189364]
We develop a complete learning theory which establishes strong consistency and optimal nonparametric min-max rates of convergence for the estimators. The numerical algorithm presented to build the estimators is parallelizable, performs well on high-dimensional problems, and is demonstrated on complex dynamical systems.
arXiv Detail & Related papers (2020-10-08T02:07:53Z)
Multipole Graph Neural Operator for Parametric Partial Differential Equations [57.90284928158383]
One of the main challenges in using deep learning-based methods for simulating physical systems is formulating physics-based data. We propose a novel multi-level graph neural network framework that captures interaction at all ranges with only linear complexity. Experiments confirm our multi-graph network learns discretization-invariant solution operators to PDEs and can be evaluated in linear time.
arXiv Detail & Related papers (2020-06-16T21:56:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.