Contrastive Language-Structure Pre-training Driven by Materials Science Literature
- URL: http://arxiv.org/abs/2501.12919v1
- Date: Wed, 22 Jan 2025 14:47:59 GMT
- Title: Contrastive Language-Structure Pre-training Driven by Materials Science Literature
- Authors: Yuta Suzuki, Tatsunori Taniai, Ryo Igarashi, Kotaro Saito, Naoya Chiba, Yoshitaka Ushiku, Kanta Ono,
- Abstract summary: Contrastive Language--Structure Pre-training (CLaSP) is a learning paradigm for constructing crossmodal embedding spaces between crystal structures and texts.<n>CLaSP aims to achieve material embeddings that capture property- and functionality-related similarities between crystal structures.<n>We demonstrate the effectiveness of CLaSP through text-based crystal structure screening and embedding space visualization.
- Score: 10.170537065646323
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Understanding structure-property relationships is an essential yet challenging aspect of materials discovery and development. To facilitate this process, recent studies in materials informatics have sought latent embedding spaces of crystal structures to capture their similarities based on properties and functionalities. However, abstract feature-based embedding spaces are human-unfriendly and prevent intuitive and efficient exploration of the vast materials space. Here we introduce Contrastive Language--Structure Pre-training (CLaSP), a learning paradigm for constructing crossmodal embedding spaces between crystal structures and texts. CLaSP aims to achieve material embeddings that 1) capture property- and functionality-related similarities between crystal structures and 2) allow intuitive retrieval of materials via user-provided description texts as queries. To compensate for the lack of sufficient datasets linking crystal structures with textual descriptions, CLaSP leverages a dataset of over 400,000 published crystal structures and corresponding publication records, including paper titles and abstracts, for training. We demonstrate the effectiveness of CLaSP through text-based crystal structure screening and embedding space visualization.
Related papers
- Contrastive Learning of English Language and Crystal Graphs for Multimodal Representation of Materials Knowledge [0.15978270011184253]
We introduce a contrastive language-crystals model (CLaC) pre-trained on a newly synthesized dataset of 126k crystal structure-text pairs.
CLaC achieves state-of-the-art zero-shot generalization performance in understanding crystal structures.
arXiv Detail & Related papers (2025-02-23T05:39:46Z) - Struct-X: Enhancing Large Language Models Reasoning with Structured Data [38.558614152006975]
Struct-X operates through five key phases: read-model-fill-reflect-reason''
It encodes structured data into a topological space using graph embeddings.
It fills in missing entity information with knowledge retrieval modules.
The final phase involves constructing a topological network with selected tokens.
arXiv Detail & Related papers (2024-07-17T13:06:25Z) - Visual Analytics for Fine-grained Text Classification Models and Datasets [3.6873612681664016]
SemLa is a novel visual analytics system tailored for fine-grained text classification.
This paper details the iterative design study and the resulting innovations featured in SemLa.
arXiv Detail & Related papers (2024-03-21T17:26:28Z) - Open-Vocabulary Camouflaged Object Segmentation [66.94945066779988]
We introduce a new task, open-vocabulary camouflaged object segmentation (OVCOS)
We construct a large-scale complex scene dataset (textbfOVCamo) containing 11,483 hand-selected images with fine annotations and corresponding object classes.
By integrating the guidance of class semantic knowledge and the supplement of visual structure cues from the edge and depth information, the proposed method can efficiently capture camouflaged objects.
arXiv Detail & Related papers (2023-11-19T06:00:39Z) - Compositional Representation of Polymorphic Crystalline Materials [56.80318252233511]
We introduce PCRL, a novel approach that employs probabilistic modeling of composition to capture the diverse polymorphs from available structural information.<n>Extensive evaluations on sixteen datasets demonstrate the effectiveness of PCRL in learning compositional representation.
arXiv Detail & Related papers (2023-11-17T20:34:28Z) - How Well Do Text Embedding Models Understand Syntax? [50.440590035493074]
The ability of text embedding models to generalize across a wide range of syntactic contexts remains under-explored.
Our findings reveal that existing text embedding models have not sufficiently addressed these syntactic understanding challenges.
We propose strategies to augment the generalization ability of text embedding models in diverse syntactic scenarios.
arXiv Detail & Related papers (2023-11-14T08:51:00Z) - DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain
Question Answering over Knowledge Base and Text [73.68051228972024]
Large Language Models (LLMs) have exhibited impressive generation capabilities, but they suffer from hallucinations when relying on their internal knowledge.
Retrieval-augmented LLMs have emerged as a potential solution to ground LLMs in external knowledge.
arXiv Detail & Related papers (2023-10-31T04:37:57Z) - Structural and Statistical Texture Knowledge Distillation for Semantic
Segmentation [72.67912031720358]
We propose a novel Structural and Statistical Texture Knowledge Distillation (SSTKD) framework for semantic segmentation.
For structural texture knowledge, we introduce a Contourlet Decomposition Module (CDM) that decomposes low-level features.
For statistical texture knowledge, we propose a Denoised Texture Intensity Equalization Module (DTIEM) to adaptively extract and enhance statistical texture knowledge.
arXiv Detail & Related papers (2023-05-06T06:01:11Z) - Leveraging Language Representation for Material Recommendation, Ranking,
and Exploration [0.0]
We introduce a material discovery framework that uses natural language embeddings derived from language models as representations of compositional and structural features.
By applying the framework to thermoelectrics, we demonstrate diversified recommendations of prototype structures and identify under-studied high-performance material spaces.
arXiv Detail & Related papers (2023-05-01T21:58:29Z) - Unifying Structure Reasoning and Language Model Pre-training for Complex
Reasoning [26.811507121199323]
This paper proposes a unified learning framework that combines explicit structure reasoning and language pre-training to endow PLMs with the structure reasoning skill.
It first identifies several elementary structures within contexts to construct structured queries and performs step-by-step reasoning along the queries to identify the answer entity.
Experimental results on four datasets demonstrate that the proposed model achieves significant improvements in complex reasoning tasks involving diverse structures.
arXiv Detail & Related papers (2023-01-21T08:18:11Z) - A Comparative Study on Structural and Semantic Properties of Sentence
Embeddings [77.34726150561087]
We propose a set of experiments using a widely-used large-scale data set for relation extraction.
We show that different embedding spaces have different degrees of strength for the structural and semantic properties.
These results provide useful information for developing embedding-based relation extraction methods.
arXiv Detail & Related papers (2020-09-23T15:45:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.