CAST: Cross Attention based multimodal fusion of Structure and Text for materials property prediction
- URL: http://arxiv.org/abs/2502.06836v2
- Date: Fri, 08 Aug 2025 08:06:09 GMT
- Title: CAST: Cross Attention based multimodal fusion of Structure and Text for materials property prediction
- Authors: Jaewan Lee, Changyoung Park, Hongjun Yang, Sungbin Lim, Woohyung Lim, Sehui Han,
- Abstract summary: Cross-attention-based model integrates graph representations with textual descriptions of materials.<n>CAST outperforms existing baseline models across four key material properties.<n>This study underscores the potential of multimodal learning frameworks for developing more accurate and globally informed predictive models in materials science.
- Score: 5.623295547221969
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advancements in graph neural networks (GNNs) have significantly enhanced the prediction of material properties by modeling crystal structures as graphs. However, GNNs often struggle to capture global structural characteristics, such as crystal systems, limiting their predictive performance. To overcome this issue, we propose CAST, a cross-attention-based multimodal model that integrates graph representations with textual descriptions of materials, effectively preserving critical structural and compositional information. Unlike previous approaches, such as CrysMMNet and MultiMat, which rely on aggregated material-level embeddings, CAST leverages cross-attention mechanisms to combine fine-grained graph node-level and text token-level features. Additionally, we introduce a masked node prediction pretraining strategy that further enhances the alignment between node and text embeddings. Our experimental results demonstrate that CAST outperforms existing baseline models across four key material properties-formation energy, band gap, bulk modulus, and shear modulus-with average relative MAE improvements ranging from 10.2% to 35.7%. Analysis of attention maps confirms the importance of pretraining in effectively aligning multimodal representations. This study underscores the potential of multimodal learning frameworks for developing more accurate and globally informed predictive models in materials science.
Related papers
- NSF-MAP: Neurosymbolic Multimodal Fusion for Robust and Interpretable Anomaly Prediction in Assembly Pipelines [0.0]
This paper proposes a neurosymbolic AI and fusion-based approach for multimodal anomaly prediction in assembly pipelines.<n>We introduce a time series and image-based fusion model that leverages decision-level fusion techniques.<n>The results demonstrate that a neurosymbolic AI-based fusion approach that uses transfer learning can effectively harness the complementary strengths of time series and image data.
arXiv Detail & Related papers (2025-05-09T16:50:42Z) - Supervised Pretraining for Material Property Prediction [0.36868085124383626]
Self-supervised learning (SSL) offers a promising alternative by pretraining on large, unlabeled datasets to develop foundation models.<n>In this work, we propose supervised pretraining, where available class information serves as surrogate labels to guide learning.<n>To further enhance representation learning, we propose a graph-based augmentation technique that injects noise to improve robustness without structurally deforming material graphs.
arXiv Detail & Related papers (2025-04-27T19:00:41Z) - Uncertainty Quantification in Graph Neural Networks with Shallow Ensembles [0.0]
Machine-learned potentials (MLPs) have revolutionized materials discovery by providing accurate and efficient predictions of molecular and material properties.
Graph Neural Networks (GNNs) have emerged as a state-of-the-art approach due to their ability to capture complex atomic interactions.
This work highlights the potential of lightweight Uncertainty Quantification (UQ) methods in improving the robustness of GNN-based materials modeling.
arXiv Detail & Related papers (2025-04-17T04:02:53Z) - Spatiotemporal Graph Learning with Direct Volumetric Information Passing and Feature Enhancement [62.91536661584656]
We propose a dual-module framework, Cell-embedded and Feature-enhanced Graph Neural Network (aka, CeFeGNN) for learning.<n>We embed learnable cell attributions to the common node-edge message passing process, which better captures the spatial dependency of regional features.<n>Experiments on various PDE systems and one real-world dataset demonstrate that CeFeGNN achieves superior performance compared with other baselines.
arXiv Detail & Related papers (2024-09-26T16:22:08Z) - Characterizing Massive Activations of Attention Mechanism in Graph Neural Networks [0.9499648210774584]
Recently, attention mechanisms have been integrated into Graph Neural Networks (GNNs) to improve their ability to capture complex patterns.
This paper presents the first comprehensive study revealing the emergence of Massive Activations (MAs) within attention layers.
Our study assesses various GNN models using benchmark datasets, including ZINC, TOX21, and PROTEINS.
arXiv Detail & Related papers (2024-09-05T12:19:07Z) - Cross-Modal Learning for Chemistry Property Prediction: Large Language Models Meet Graph Machine Learning [0.0]
We introduce a Multi-Modal Fusion (MMF) framework that harnesses the analytical prowess of Graph Neural Networks (GNNs) and the linguistic generative and predictive abilities of Large Language Models (LLMs)
Our framework combines the effectiveness of GNNs in modeling graph-structured data with the zero-shot and few-shot learning capabilities of LLMs, enabling improved predictions while reducing the risk of overfitting.
arXiv Detail & Related papers (2024-08-27T11:10:39Z) - Enhancing material property prediction with ensemble deep graph convolutional networks [9.470117608423957]
Recent efforts have focused on employing advanced ML algorithms, including deep learning - based graph neural network, for property prediction.
Our research provides an in-depth evaluation of ensemble strategies in deep learning - based graph neural network, specifically targeting material property prediction tasks.
By testing the Crystal Graph Convolutional Neural Network (CGCNN) and its multitask version, MT-CGCNN, we demonstrated that ensemble techniques, especially prediction averaging, substantially improve precision beyond traditional metrics.
arXiv Detail & Related papers (2024-07-26T16:12:06Z) - Benchmark on Drug Target Interaction Modeling from a Structure Perspective [48.60648369785105]
Drug-target interaction prediction is crucial to drug discovery and design.
Recent methods, such as those based on graph neural networks (GNNs) and Transformers, demonstrate exceptional performance across various datasets.
We conduct a comprehensive survey and benchmark for drug-target interaction modeling from a structure perspective, via integrating tens of explicit (i.e., GNN-based) and implicit (i.e., Transformer-based) structure learning algorithms.
arXiv Detail & Related papers (2024-07-04T16:56:59Z) - A Pure Transformer Pretraining Framework on Text-attributed Graphs [50.833130854272774]
We introduce a feature-centric pretraining perspective by treating graph structure as a prior.
Our framework, Graph Sequence Pretraining with Transformer (GSPT), samples node contexts through random walks.
GSPT can be easily adapted to both node classification and link prediction, demonstrating promising empirical success on various datasets.
arXiv Detail & Related papers (2024-06-19T22:30:08Z) - TokenUnify: Scaling Up Autoregressive Pretraining for Neuron Segmentation [65.65530016765615]
We propose a hierarchical predictive coding framework that captures multi-scale dependencies through three complementary learning objectives.<n> TokenUnify integrates random token prediction, next-token prediction, and next-all token prediction to create a comprehensive representational space.<n>We also introduce a large-scale EM dataset with 1.2 billion annotated voxels, offering ideal long-sequence visual data with spatial continuity.
arXiv Detail & Related papers (2024-05-27T05:45:51Z) - Contextualizing MLP-Mixers Spatiotemporally for Urban Data Forecast at Scale [54.15522908057831]
We propose an adapted version of the computationally-Mixer for STTD forecast at scale.
Our results surprisingly show that this simple-yeteffective solution can rival SOTA baselines when tested on several traffic benchmarks.
Our findings contribute to the exploration of simple-yet-effective models for real-world STTD forecasting.
arXiv Detail & Related papers (2023-07-04T05:19:19Z) - Disentangling Structured Components: Towards Adaptive, Interpretable and
Scalable Time Series Forecasting [52.47493322446537]
We develop a adaptive, interpretable and scalable forecasting framework, which seeks to individually model each component of the spatial-temporal patterns.
SCNN works with a pre-defined generative process of MTS, which arithmetically characterizes the latent structure of the spatial-temporal patterns.
Extensive experiments are conducted to demonstrate that SCNN can achieve superior performance over state-of-the-art models on three real-world datasets.
arXiv Detail & Related papers (2023-05-22T13:39:44Z) - Scalable deeper graph neural networks for high-performance materials
property prediction [1.9129213267332026]
We propose a novel graph attention neural network model DeeperGATGNN with differenti groupable normalization and skip-connections.
Our work shows that to deal with the high complexity of mapping the crystal materials structures to their properties, large-scale very deep graph neural networks are needed to achieve robust performances.
arXiv Detail & Related papers (2021-09-25T05:58:04Z) - PredRNN: A Recurrent Neural Network for Spatiotemporal Predictive
Learning [109.84770951839289]
We present PredRNN, a new recurrent network for learning visual dynamics from historical context.
We show that our approach obtains highly competitive results on three standard datasets.
arXiv Detail & Related papers (2021-03-17T08:28:30Z) - Probabilistic Graph Attention Network with Conditional Kernels for
Pixel-Wise Prediction [158.88345945211185]
We present a novel approach that advances the state of the art on pixel-level prediction in a fundamental aspect, i.e. structured multi-scale features learning and fusion.
We propose a probabilistic graph attention network structure based on a novel Attention-Gated Conditional Random Fields (AG-CRFs) model for learning and fusing multi-scale representations in a principled manner.
arXiv Detail & Related papers (2021-01-08T04:14:29Z) - Distance-aware Molecule Graph Attention Network for Drug-Target Binding
Affinity Prediction [54.93890176891602]
We propose a diStance-aware Molecule graph Attention Network (S-MAN) tailored to drug-target binding affinity prediction.
As a dedicated solution, we first propose a position encoding mechanism to integrate the topological structure and spatial position information into the constructed pocket-ligand graph.
We also propose a novel edge-node hierarchical attentive aggregation structure which has edge-level aggregation and node-level aggregation.
arXiv Detail & Related papers (2020-12-17T17:44:01Z) - Neural Networks Enhancement with Logical Knowledge [83.9217787335878]
We propose an extension of KENN for relational data.
The results show that KENN is capable of increasing the performances of the underlying neural network even in the presence relational data.
arXiv Detail & Related papers (2020-09-13T21:12:20Z) - Graph Neural Network for Hamiltonian-Based Material Property Prediction [56.94118357003096]
We present and compare several different graph convolution networks that are able to predict the band gap for inorganic materials.
The models are developed to incorporate two different features: the information of each orbital itself and the interaction between each other.
The results show that our model can get a promising prediction accuracy with cross-validation.
arXiv Detail & Related papers (2020-05-27T13:32:10Z) - Global Attention based Graph Convolutional Neural Networks for Improved
Materials Property Prediction [8.371766047183739]
We develop a novel model, GATGNN, for predicting inorganic material properties based on graph neural networks.
We show that our method is able to both outperform the previous models' predictions and provide insight into the crystallization of the material.
arXiv Detail & Related papers (2020-03-11T07:43:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.