Perovskite-R1: A Domain-Specialized LLM for Intelligent Discovery of Precursor Additives and Experimental Design
- URL: http://arxiv.org/abs/2507.16307v1
- Date: Tue, 22 Jul 2025 07:48:32 GMT
- Title: Perovskite-R1: A Domain-Specialized LLM for Intelligent Discovery of Precursor Additives and Experimental Design
- Authors: Xin-De Wang, Zhi-Rui Chen, Peng-Jie Guo, Ze-Feng Gao, Cheng Mu, Zhong-Yi Lu,
- Abstract summary: Perovskite solar cells (PSCs) have rapidly emerged as a leading contender in next-generation photovoltaic technologies.<n> challenges such as long-term stability, environmental sustainability, and scalable manufacturing continue to hinder their commercialization.<n>Precursor additive engineering has shown promise in addressing these issues by enhancing both the performance and durability of PSCs.<n>We introduce Perovskite-R1, a specialized large language model (LLM) with advanced reasoning capabilities tailored for the discovery and design of PSC precursor additives.
- Score: 5.378023608941598
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Perovskite solar cells (PSCs) have rapidly emerged as a leading contender in next-generation photovoltaic technologies, owing to their exceptional power conversion efficiencies and advantageous material properties. Despite these advances, challenges such as long-term stability, environmental sustainability, and scalable manufacturing continue to hinder their commercialization. Precursor additive engineering has shown promise in addressing these issues by enhancing both the performance and durability of PSCs. However, the explosive growth of scientific literature and the complex interplay of materials, processes, and device architectures make it increasingly difficult for researchers to efficiently access, organize, and utilize domain knowledge in this rapidly evolving field. To address this gap, we introduce Perovskite-R1, a specialized large language model (LLM) with advanced reasoning capabilities tailored for the discovery and design of PSC precursor additives. By systematically mining and curating 1,232 high-quality scientific publications and integrating a comprehensive library of 33,269 candidate materials, we constructed a domain-specific instruction-tuning dataset using automated question-answer generation and chain-of-thought reasoning. Fine-tuning the QwQ-32B model on this dataset resulted in Perovskite-R1, which can intelligently synthesize literature insights and generate innovative and practical solutions for defect passivation and the selection of precursor additives. Experimental validation of several model-proposed strategies confirms their effectiveness in improving material stability and performance. Our work demonstrates the potential of domain-adapted LLMs in accelerating materials discovery and provides a closed-loop framework for intelligent, data-driven advancements in perovskite photovoltaic research.
Related papers
- Autonomous Inorganic Materials Discovery via Multi-Agent Physics-Aware Scientific Reasoning [0.0]
We introduce SparksMatter, a multi-agent AI model for automated inorganic materials design.<n>It generates ideas, designing and executing experimental, continuously evaluating and refining results, and proposing candidate materials.<n>The model's performance is evaluated across case studies in thermoelectrics, semiconductors, and perovskite oxides materials design.
arXiv Detail & Related papers (2025-08-04T23:40:43Z) - Expert-Guided LLM Reasoning for Battery Discovery: From AI-Driven Hypothesis to Synthesis and Characterization [47.97016882216093]
Large language models (LLMs) leverage chain-of-thought (CoT) techniques to tackle complex problems.<n>We introduce ChatBattery, a novel agentic framework that integrates domain knowledge to steer LLMs toward more effective reasoning in materials design.<n>We successfully identify, synthesize, and characterize three novel lithium-ion battery cathode materials, which achieve practical capacity improvements of 28.8%, 25.2%, and 18.5%, respectively.
arXiv Detail & Related papers (2025-07-21T23:46:11Z) - Materials Generation in the Era of Artificial Intelligence: A Comprehensive Survey [54.40267149907223]
Materials are the foundation of modern society, underpinning advancements in energy, electronics, healthcare, transportation, and infrastructure.<n>The ability to discover and design new materials with tailored properties is critical to solving some of the most pressing global challenges.<n>Data-driven generative models provide a powerful tool for materials design by directly create novel materials that satisfy predefined property requirements.
arXiv Detail & Related papers (2025-05-22T08:33:21Z) - Causal Discovery from Data Assisted by Large Language Models [50.193740129296245]
It is essential to integrate experimental data with prior domain knowledge for knowledge driven discovery.<n>Here we demonstrate this approach by combining high-resolution scanning transmission electron microscopy (STEM) data with insights derived from large language models (LLMs)<n>By fine-tuning ChatGPT on domain-specific literature, we construct adjacency matrices for Directed Acyclic Graphs (DAGs) that map the causal relationships between structural, chemical, and polarization degrees of freedom in Sm-doped BiFeO3 (SmBFO)
arXiv Detail & Related papers (2025-03-18T02:14:49Z) - Large Language Models Post-training: Surveying Techniques from Alignment to Reasoning [185.51013463503946]
Large Language Models (LLMs) have fundamentally transformed natural language processing, making them indispensable across domains ranging from conversational systems to scientific exploration.<n>These challenges necessitate advanced post-training language models (PoLMs) to address shortcomings, such as restricted reasoning capacities, ethical uncertainties, and suboptimal domain-specific performance.<n>This paper presents the first comprehensive survey of PoLMs, systematically tracing their evolution across five core paradigms: Fine-tuning, which enhances task-specific accuracy; Alignment, which ensures ethical coherence and alignment with human preferences; Reasoning, which advances multi-step inference despite challenges in reward design; Integration and Adaptation, which
arXiv Detail & Related papers (2025-03-08T05:41:42Z) - Towards Fully-Automated Materials Discovery via Large-Scale Synthesis Dataset and Expert-Level LLM-as-a-Judge [6.500470477634259]
Our work aims to support the materials science community by providing a practical, data-driven resource.<n>We have curated a comprehensive dataset of 17K expert-verified synthesis recipes from open-access literature.<n>AlchemicalBench offers an end-to-end framework that supports research in large language models applied to synthesis prediction.
arXiv Detail & Related papers (2025-02-23T06:16:23Z) - GENERator: A Long-Context Generative Genomic Foundation Model [66.46537421135996]
We present GENERator, a generative genomic foundation model featuring a context length of 98k base pairs (bp) and 1.2B parameters.<n>Trained on an expansive dataset comprising 386B bp of DNA, the GENERator demonstrates state-of-the-art performance across both established and newly proposed benchmarks.<n>It also shows significant promise in sequence optimization, particularly through the prompt-responsive generation of enhancer sequences with specific activity profiles.
arXiv Detail & Related papers (2025-02-11T05:39:49Z) - The Dual-use Dilemma in LLMs: Do Empowering Ethical Capacities Make a Degraded Utility? [54.18519360412294]
Large Language Models (LLMs) must balance between rejecting harmful requests for safety and accommodating legitimate ones for utility.<n>This paper presents a Direct Preference Optimization (DPO) based alignment framework that achieves better overall performance.<n>We analyze experimental results obtained from testing DeepSeek-R1 on our benchmark and reveal the critical ethical concerns raised by this highly acclaimed model.
arXiv Detail & Related papers (2025-01-20T06:35:01Z) - Foundational Large Language Models for Materials Research [22.77591279242839]
Large Language Models (LLMs) offer opportunities to accelerate materials research through automated analysis and prediction.<n>Here, we present LLaMat, a family of foundational models for materials science developed through continued pretraining of LLaMA models.<n>We demonstrate that LLaMat excels in materials-specific NLP and structured information extraction while maintaining general linguistic capabilities.
arXiv Detail & Related papers (2024-12-12T18:46:38Z) - Sustainable Diffusion-based Incentive Mechanism for Generative AI-driven Digital Twins in Industrial Cyber-Physical Systems [65.22300383287904]
Industrial Cyber-Physical Systems (ICPSs) are an integral component of modern manufacturing and industries.<n>By digitizing data throughout product life cycles, Digital Twins (DTs) in ICPSs enable a shift from current industrial infrastructures to intelligent and adaptive infrastructures.<n>GenAI can drive the construction and update of DTs to improve predictive accuracy and prepare for diverse smart manufacturing.
arXiv Detail & Related papers (2024-08-02T10:47:10Z) - Exploring Augmentation and Cognitive Strategies for AI based Synthetic Personae [1.0742675209112622]
This position paper advocates for using large language models (LLMs) as data augmentation systems rather than zero-shot generators.
We propose the development of robust cognitive and memory frameworks to guide LLM responses.
Initial explorations suggest that data enrichment, episodic memory, and self-reflection techniques can improve the reliability of synthetic personae.
arXiv Detail & Related papers (2024-04-16T20:22:12Z) - AIMS-EREA -- A framework for AI-accelerated Innovation of Materials for
Sustainability -- for Environmental Remediation and Energy Applications [0.0]
AIMS-EREA is our novel framework to blend best of breed of Material Science theory with power of Generative AI.
This also helps to eliminate the possibility of production of hazardous residues and bye-products of the reactions.
arXiv Detail & Related papers (2023-11-18T12:35:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.