Topic Modeling and Link-Prediction for Material Property Discovery
- URL: http://arxiv.org/abs/2507.06139v1
- Date: Tue, 08 Jul 2025 16:20:46 GMT
- Title: Topic Modeling and Link-Prediction for Material Property Discovery
- Authors: Ryan C. Barron, Maksim E. Eren, Valentin Stanev, Cynthia Matuszek, Boian S. Alexandrov,
- Abstract summary: Link prediction infers missing or future relations between graph nodes, based on connection patterns.<n>We present an AI-driven hierarchical link prediction framework that integrates matrix factorization to infer hidden associations.<n>Our method finds hidden connections in a graph of material to latent topic associations built from scientific literature.
- Score: 6.0045906216050815
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Link prediction infers missing or future relations between graph nodes, based on connection patterns. Scientific literature networks and knowledge graphs are typically large, sparse, and noisy, and often contain missing links between entities. We present an AI-driven hierarchical link prediction framework that integrates matrix factorization to infer hidden associations and steer discovery in complex material domains. Our method combines Hierarchical Nonnegative Matrix Factorization (HNMFk) and Boolean matrix factorization (BNMFk) with automatic model selection, as well as Logistic matrix factorization (LMF), we use to construct a three-level topic tree from a 46,862-document corpus focused on 73 transition-metal dichalcogenides (TMDs). These materials are studied in a variety of physics fields with many current and potential applications. An ensemble BNMFk + LMF approach fuses discrete interpretability with probabilistic scoring. The resulting HNMFk clusters map each material onto coherent topics like superconductivity, energy storage, and tribology. Also, missing or weakly connected links are highlight between topics and materials, suggesting novel hypotheses for cross-disciplinary exploration. We validate our method by removing publications about superconductivity in well-known superconductors, and show the model predicts associations with the superconducting TMD clusters. This shows the method finds hidden connections in a graph of material to latent topic associations built from scientific literature, especially useful when examining a diverse corpus of scientific documents covering the same class of phenomena or materials but originating from distinct communities and perspectives. The inferred links generating new hypotheses, produced by our method, are exposed through an interactive Streamlit dashboard, designed for human-in-the-loop scientific discovery.
Related papers
- Causal Discovery from Data Assisted by Large Language Models [50.193740129296245]
It is essential to integrate experimental data with prior domain knowledge for knowledge driven discovery.<n>Here we demonstrate this approach by combining high-resolution scanning transmission electron microscopy (STEM) data with insights derived from large language models (LLMs)<n>By fine-tuning ChatGPT on domain-specific literature, we construct adjacency matrices for Directed Acyclic Graphs (DAGs) that map the causal relationships between structural, chemical, and polarization degrees of freedom in Sm-doped BiFeO3 (SmBFO)
arXiv Detail & Related papers (2025-03-18T02:14:49Z) - Multimodal Contrastive Representation Learning in Augmented Biomedical Knowledge Graphs [2.006175707670159]
PrimeKG++ is an enriched knowledge graph incorporating multimodal data.<n>Our approach demonstrates strong generalizability, enabling accurate link predictions even for unseen nodes.
arXiv Detail & Related papers (2025-01-03T05:29:12Z) - Accelerating Scientific Discovery with Generative Knowledge Extraction, Graph-Based Representation, and Multimodal Intelligent Graph Reasoning [0.0]
We have transformed a dataset comprising 1,000 scientific papers into an ontological knowledge graph.
We have calculated node degrees, identified communities and connectivities, and evaluated clustering coefficients and betweenness centrality of pivotal nodes.
The graph has an inherently scale-free nature, is highly connected, and can be used for graph reasoning.
arXiv Detail & Related papers (2024-03-18T17:30:27Z) - Learning Complete Topology-Aware Correlations Between Relations for Inductive Link Prediction [121.65152276851619]
We show that semantic correlations between relations are inherently edge-level and entity-independent.
We propose a novel subgraph-based method, namely TACO, to model Topology-Aware COrrelations between relations.
To further exploit the potential of RCN, we propose Complete Common Neighbor induced subgraph.
arXiv Detail & Related papers (2023-09-20T08:11:58Z) - Band-gap regression with architecture-optimized message-passing neural
networks [1.9590152885845324]
We train an MPNN to first classify materials through density functional theory data from the AFLOW database as being metallic or semiconducting/insulating.
We then perform a neural-architecture search to explore the model architecture and hyper parameter space of MPNNs to predict the band gaps of the materials identified as non-metals.
The top-performing models from the search are pooled into an ensemble that significantly outperforms existing models from the literature.
arXiv Detail & Related papers (2023-09-12T16:13:10Z) - Knowledge-Enhanced Hierarchical Information Correlation Learning for
Multi-Modal Rumor Detection [82.94413676131545]
We propose a novel knowledge-enhanced hierarchical information correlation learning approach (KhiCL) for multi-modal rumor detection.
KhiCL exploits cross-modal joint dictionary to transfer the heterogeneous unimodality features into the common feature space.
It extracts visual and textual entities from images and text, and designs a knowledge relevance reasoning strategy.
arXiv Detail & Related papers (2023-06-28T06:08:20Z) - Unsupervised physics-informed disentanglement of multimodal data for
high-throughput scientific discovery [4.923937591056569]
We introduce physics-informed multimodal autoencoders (PIMA)
PIMA is a variational inference framework for discovering shared information in multimodal scientific datasets.
A dataset of lattice metamaterials from metal additive manufacturing demonstrates accurate cross modal inference.
arXiv Detail & Related papers (2022-02-07T14:47:00Z) - Sawtooth Factorial Topic Embeddings Guided Gamma Belief Network [49.458250193768826]
We propose sawtooth factorial topic embedding guided GBN, a deep generative model of documents.
Both the words and topics are represented as embedding vectors of the same dimension.
Our models outperform other neural topic models on extracting deeper interpretable topics.
arXiv Detail & Related papers (2021-06-30T10:14:57Z) - Graph Neural Network for Hamiltonian-Based Material Property Prediction [56.94118357003096]
We present and compare several different graph convolution networks that are able to predict the band gap for inorganic materials.
The models are developed to incorporate two different features: the information of each orbital itself and the interaction between each other.
The results show that our model can get a promising prediction accuracy with cross-validation.
arXiv Detail & Related papers (2020-05-27T13:32:10Z) - Learning Bijective Feature Maps for Linear ICA [73.85904548374575]
We show that existing probabilistic deep generative models (DGMs) which are tailor-made for image data, underperform on non-linear ICA tasks.
To address this, we propose a DGM which combines bijective feature maps with a linear ICA model to learn interpretable latent structures for high-dimensional data.
We create models that converge quickly, are easy to train, and achieve better unsupervised latent factor discovery than flow-based models, linear ICA, and Variational Autoencoders on images.
arXiv Detail & Related papers (2020-02-18T17:58:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.