Related papers: Curvature-informed multi-task learning for graph networks

Curvature-informed multi-task learning for graph networks

URL: http://arxiv.org/abs/2208.01684v1
Date: Tue, 2 Aug 2022 18:18:41 GMT
Title: Curvature-informed multi-task learning for graph networks
Authors: Alexander New, Michael J. Pekala, Nam Q. Le, Janna Domenico, Christine D. Piatko, Christopher D. Stiles
Abstract summary: State-of-the-art graph neural networks attempt to predict multiple properties simultaneously. We investigate a potential explanation for this phenomenon: the curvature of each property's loss surface significantly varies, leading to inefficient learning.
Score: 56.155331323304
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Properties of interest for crystals and molecules, such as band gap, elasticity, and solubility, are generally related to each other: they are governed by the same underlying laws of physics. However, when state-of-the-art graph neural networks attempt to predict multiple properties simultaneously (the multi-task learning (MTL) setting), they frequently underperform a suite of single property predictors. This suggests graph networks may not be fully leveraging these underlying similarities. Here we investigate a potential explanation for this phenomenon: the curvature of each property's loss surface significantly varies, leading to inefficient learning. This difference in curvature can be assessed by looking at spectral properties of the Hessians of each property's loss function, which is done in a matrix-free manner via randomized numerical linear algebra. We evaluate our hypothesis on two benchmark datasets (Materials Project (MP) and QM8) and consider how these findings can inform the training of novel multi-task learning models.

Related papers

A Random Matrix Theory Perspective on the Spectrum of Learned Features and Asymptotic Generalization Capabilities [30.737171081270322]
We study how fully-connected two-layer neural networks adapt to the target function after a single, but aggressive, gradient descent step. This provides a sharp description of the impact of feature learning in the generalization of two-layer neural networks, beyond the random features and lazy training regimes.
arXiv Detail & Related papers (2024-10-24T17:24:34Z)
In-Context Learning of Physical Properties: Few-Shot Adaptation to Out-of-Distribution Molecular Graphs [1.8635507597668244]
In-context learning allows for performing nontrivial machine learning tasks during inference only. In this work, we address the question: can we leverage in-context learning to predict out-of-distribution materials properties? We employ a compound model in which GPT-2 acts on the output of geometry-aware graph neural networks to adapt in-context information.
arXiv Detail & Related papers (2024-06-03T21:59:21Z)
A theory of data variability in Neural Network Bayesian inference [0.70224924046445]
We provide a field-theoretic formalism which covers the generalization properties of infinitely wide networks. We derive the generalization properties from the statistical properties of the input. We show that data variability leads to a non-Gaussian action reminiscent of a ($varphi3+varphi4$)-theory.
arXiv Detail & Related papers (2023-07-31T14:11:32Z)
Geometric Graph Filters and Neural Networks: Limit Properties and Discriminability Trade-offs [122.06927400759021]
We study the relationship between a graph neural network (GNN) and a manifold neural network (MNN) when the graph is constructed from a set of points sampled from the manifold. We prove non-asymptotic error bounds showing that convolutional filters and neural networks on these graphs converge to convolutional filters and neural networks on the continuous manifold.
arXiv Detail & Related papers (2023-05-29T08:27:17Z)
Predictions Based on Pixel Data: Insights from PDEs and Finite Differences [0.0]
This paper deals with approximation of time sequences where each observation is a matrix. We show that with relatively small networks, we can represent exactly a class of numerical discretizations of PDEs based on the method of lines. Our network architecture is inspired by those typically adopted in the approximation of time sequences.
arXiv Detail & Related papers (2023-05-01T08:54:45Z)
Learning the Evolutionary and Multi-scale Graph Structure for Multivariate Time Series Forecasting [50.901984244738806]
We show how to model the evolutionary and multi-scale interactions of time series. In particular, we first provide a hierarchical graph structure cooperated with the dilated convolution to capture the scale-specific correlations. A unified neural network is provided to integrate the components above to get the final prediction.
arXiv Detail & Related papers (2022-06-28T08:11:12Z)
TACTiS: Transformer-Attentional Copulas for Time Series [76.71406465526454]
estimation of time-varying quantities is a fundamental component of decision making in fields such as healthcare and finance. We propose a versatile method that estimates joint distributions using an attention-based decoder. We show that our model produces state-of-the-art predictions on several real-world datasets.
arXiv Detail & Related papers (2022-02-07T21:37:29Z)
Graph network for simultaneous learning of forward and inverse physics [0.0]
We propose an end-to-end graph network that learns forward and inverse models of particle-based physics using interpretable inductive biases. Our approach is able to predict the forward dynamics with at least an order of magnitude higher accuracy.
arXiv Detail & Related papers (2021-12-13T22:38:09Z)
Multi-scale Feature Learning Dynamics: Insights for Double Descent [71.91871020059857]
We study the phenomenon of "double descent" of the generalization error. We find that double descent can be attributed to distinct features being learned at different scales.
arXiv Detail & Related papers (2021-12-06T18:17:08Z)
Fundamental Limits and Tradeoffs in Invariant Representation Learning [99.2368462915979]
Many machine learning applications involve learning representations that achieve two competing goals. Minimax game-theoretic formulation represents a fundamental tradeoff between accuracy and invariance. We provide an information-theoretic analysis of this general and important problem under both classification and regression settings.
arXiv Detail & Related papers (2020-12-19T15:24:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.