Out-of-distribution materials property prediction using adversarial learning based fine-tuning
- URL: http://arxiv.org/abs/2408.09297v1
- Date: Sat, 17 Aug 2024 21:22:21 GMT
- Title: Out-of-distribution materials property prediction using adversarial learning based fine-tuning
- Authors: Qinyang Li, Nicholas Miklaucic, Jianjun Hu,
- Abstract summary: We propose an adversarial learning based targeting finetuning approach to make the model adapted to a particular dataset.
Our experiments demonstrate the success of our CAL algorithm with its high effectiveness in ML with limited samples.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The accurate prediction of material properties is crucial in a wide range of scientific and engineering disciplines. Machine learning (ML) has advanced the state of the art in this field, enabling scientists to discover novel materials and design materials with specific desired properties. However, one major challenge that persists in material property prediction is the generalization of models to out-of-distribution (OOD) samples,i.e., samples that differ significantly from those encountered during training. In this paper, we explore the application of advancements in OOD learning approaches to enhance the robustness and reliability of material property prediction models. We propose and apply the Crystal Adversarial Learning (CAL) algorithm for OOD materials property prediction,which generates synthetic data during training to bias the training towards those samples with high prediction uncertainty. We further propose an adversarial learning based targeting finetuning approach to make the model adapted to a particular OOD dataset, as an alternative to traditional fine-tuning. Our experiments demonstrate the success of our CAL algorithm with its high effectiveness in ML with limited samples which commonly occurs in materials science. Our work represents a promising direction toward better OOD learning and materials property prediction.
Related papers
- Context is Key: A Benchmark for Forecasting with Essential Textual Information [87.3175915185287]
"Context is Key" (CiK) is a time series forecasting benchmark that pairs numerical data with diverse types of carefully crafted textual context.
We evaluate a range of approaches, including statistical models, time series foundation models, and LLM-based forecasters.
Our experiments highlight the importance of incorporating contextual information, demonstrate surprising performance when using LLM-based forecasting models, and also reveal some of their critical shortcomings.
arXiv Detail & Related papers (2024-10-24T17:56:08Z) - From Tokens to Materials: Leveraging Language Models for Scientific Discovery [12.211984932142537]
This study investigates the application of language model embeddings to enhance material property prediction in materials science.
We demonstrate that domain-specific models, particularly MatBERT, significantly outperform general-purpose models in extracting implicit knowledge from compound names and material properties.
arXiv Detail & Related papers (2024-10-21T16:31:23Z) - Can OOD Object Detectors Learn from Foundation Models? [56.03404530594071]
Out-of-distribution (OOD) object detection is a challenging task due to the absence of open-set OOD data.
Inspired by recent advancements in text-to-image generative models, we study the potential of generative models trained on large-scale open-set data to synthesize OOD samples.
We introduce SyncOOD, a simple data curation method that capitalizes on the capabilities of large foundation models.
arXiv Detail & Related papers (2024-09-08T17:28:22Z) - Self-supervised learning for crystal property prediction via denoising [43.148818844265236]
We propose a novel self-supervised learning (SSL) strategy for material property prediction.
Our approach, crystal denoising self-supervised learning (CDSSL), pretrains predictive models with a pretext task based on recovering valid material structures.
We demonstrate that CDSSL models out-perform models trained without SSL, across material types, properties, and dataset sizes.
arXiv Detail & Related papers (2024-08-30T12:53:40Z) - Structure-based out-of-distribution (OOD) materials property prediction:
a benchmark study [1.3711992220025948]
We present a benchmark study of structure-based graph neural networks (GNNs) for extrapolative OOD materials property prediction.
Our experiments show that current state-of-the-art GNN algorithms significantly underperform for the OOD property prediction tasks.
We identify the sources of CGCNN, ALIGNN, and DeeperGATGNN's significantly more robust OOD performance than those of the current best models.
arXiv Detail & Related papers (2024-01-16T01:03:39Z) - Towards out-of-distribution generalizable predictions of chemical
kinetics properties [61.15970601264632]
Out-Of-Distribution (OOD) kinetic property prediction is required to be generalizable.
In this paper, we categorize the OOD kinetic property prediction into three levels (structure, condition, and mechanism)
We create comprehensive datasets to benchmark the state-of-the-art ML approaches for reaction prediction in the OOD setting and the state-of-the-art graph OOD methods in kinetics property prediction problems.
arXiv Detail & Related papers (2023-10-04T20:36:41Z) - Learning Objective-Specific Active Learning Strategies with Attentive
Neural Processes [72.75421975804132]
Learning Active Learning (LAL) suggests to learn the active learning strategy itself, allowing it to adapt to the given setting.
We propose a novel LAL method for classification that exploits symmetry and independence properties of the active learning problem.
Our approach is based on learning from a myopic oracle, which gives our model the ability to adapt to non-standard objectives.
arXiv Detail & Related papers (2023-09-11T14:16:37Z) - Materials Informatics Transformer: A Language Model for Interpretable
Materials Properties Prediction [6.349503549199403]
We introduce our model Materials Informatics Transformer (MatInFormer) for material property prediction.
Specifically, we introduce a novel approach that involves learning the grammar of crystallography through the tokenization of pertinent space group information.
arXiv Detail & Related papers (2023-08-30T18:34:55Z) - Pseudo-OOD training for robust language models [78.15712542481859]
OOD detection is a key component of a reliable machine-learning model for any industry-scale application.
We propose POORE - POsthoc pseudo-Ood REgularization, that generates pseudo-OOD samples using in-distribution (IND) data.
We extensively evaluate our framework on three real-world dialogue systems, achieving new state-of-the-art in OOD detection.
arXiv Detail & Related papers (2022-10-17T14:32:02Z) - SimSCOOD: Systematic Analysis of Out-of-Distribution Generalization in
Fine-tuned Source Code Models [58.78043959556283]
We study the behaviors of models under different fine-tuning methodologies, including full fine-tuning and Low-Rank Adaptation (LoRA) fine-tuning methods.
Our analysis uncovers that LoRA fine-tuning consistently exhibits significantly better OOD generalization performance than full fine-tuning across various scenarios.
arXiv Detail & Related papers (2022-10-10T16:07:24Z) - Understanding Out-of-distribution: A Perspective of Data Dynamics [5.811774625668462]
This paper explores how data dynamics in training models can be used to understand the fundamental differences between OOD and in-distribution samples.
We found that syntactic characteristics of the data samples that the model consistently predicts incorrectly in both OOD and in-distribution cases directly contradict each.
arXiv Detail & Related papers (2021-11-29T17:34:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.