Related papers: Accelerating High-Efficiency Organic Photovoltaic Discovery via Pretrained Graph Neural Networks and Generative Reinforcement Learning

Accelerating High-Efficiency Organic Photovoltaic Discovery via Pretrained Graph Neural Networks and Generative Reinforcement Learning

URL: http://arxiv.org/abs/2503.23766v1
Date: Mon, 31 Mar 2025 06:31:15 GMT
Title: Accelerating High-Efficiency Organic Photovoltaic Discovery via Pretrained Graph Neural Networks and Generative Reinforcement Learning
Authors: Jiangjie Qiu, Hou Hei Lam, Xiuyuan Hu, Wentao Li, Siwei Fu, Fankun Zeng, Hao Zhang, Xiaonan Wang,
Abstract summary: We propose a framework that integrates large-scale pretraining of graph neural networks (GNNs) with a GPT-2-based reinforcement learning (RL) strategy to design OPV molecules with potentially high PCE.<n>This approach produces candidate molecules with predicted efficiencies approaching 21%, although further experimental validation is required.<n>We are building the largest open-source OPV dataset to date, expected to include nearly 3,000 donor-acceptor pairs.
Score: 8.898093296126603
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Organic photovoltaic (OPV) materials offer a promising avenue toward cost-effective solar energy utilization. However, optimizing donor-acceptor (D-A) combinations to achieve high power conversion efficiency (PCE) remains a significant challenge. In this work, we propose a framework that integrates large-scale pretraining of graph neural networks (GNNs) with a GPT-2 (Generative Pretrained Transformer 2)-based reinforcement learning (RL) strategy to design OPV molecules with potentially high PCE. This approach produces candidate molecules with predicted efficiencies approaching 21\%, although further experimental validation is required. Moreover, we conducted a preliminary fragment-level analysis to identify structural motifs recognized by the RL model that may contribute to enhanced PCE, thus providing design guidelines for the broader research community. To facilitate continued discovery, we are building the largest open-source OPV dataset to date, expected to include nearly 3,000 donor-acceptor pairs. Finally, we discuss plans to collaborate with experimental teams on synthesizing and characterizing AI-designed molecules, which will provide new data to refine and improve our predictive and generative models.

Related papers

DrugImproverGPT: A Large Language Model for Drug Optimization with Fine-Tuning via Structured Policy Optimization [53.27954325490941]
Finetuning a Large Language Model (LLM) is crucial for generating results towards specific objectives.<n>This research introduces a novel reinforcement learning algorithm to finetune a drug optimization LLM-based generative model.
arXiv Detail & Related papers (2025-02-11T04:00:21Z)
Machine Learning Co-pilot for Screening of Organic Molecular Additives for Perovskite Solar Cells [12.969955836781773]
Co-Pilot for Perovskite Additive Screener (Co-PAS) is an ML-driven framework designed to accelerate additive screening for perovskite solar cells.<n>Co-PAS overcomes predictive biases by integrating scaffold-based pre-screening and latent Junction Tree Variational Autoencoder (JTVAE)<n>We identify several promising passivating molecules, including the novel Boc-L-threonine N-hydroxysuccin ester (BTN)
arXiv Detail & Related papers (2024-12-18T17:52:45Z)
Decomposed Direct Preference Optimization for Structure-Based Drug Design [47.561983733291804]
We propose DecompDPO, a structure-based optimization method to align diffusion models with pharmaceutical needs. DecompDPO can be effectively used for two main purposes: fine-tuning pretrained diffusion models for molecule generation across various protein families, and molecular optimization given a specific protein subpocket after generation.
arXiv Detail & Related papers (2024-07-19T02:12:25Z)
Exploratory Preference Optimization: Harnessing Implicit Q*-Approximation for Sample-Efficient RLHF [82.7679132059169]
Reinforcement learning from human feedback has emerged as a central tool for language model alignment. We propose a new algorithm for online exploration in RLHF, Exploratory Preference Optimization (XPO) XPO enjoys the strongest known provable guarantees and promising empirical performance.
arXiv Detail & Related papers (2024-05-31T17:39:06Z)
LightCPPgen: An Explainable Machine Learning Pipeline for Rational Design of Cell Penetrating Peptides [0.32985979395737786]
We introduce an innovative approach for the de novo design of CPPs, leveraging the strengths of machine learning (ML) and optimization algorithms. Our strategy, named Light CPPgen, integrates a LightGBM-based predictive model with a genetic algorithm (GA) The GA solutions specifically target the candidate sequences' penetrability score, while trying to maximize similarity with the original non-penetrating peptide.
arXiv Detail & Related papers (2024-05-31T10:57:25Z)
PILOT: Equivariant diffusion for pocket conditioned de novo ligand generation with multi-objective guidance via importance sampling [8.619610909783441]
We propose an in-silico approach for the $textitde novo$ generation of 3D ligand structures using the equivariant diffusion model PILOT. Its multi-objective-based importance sampling strategy is designed to direct the model towards molecules that exhibit desired characteristics. We employ PILOT to generate novel metrics for unseen protein pockets from the Kinodata-3D dataset.
arXiv Detail & Related papers (2024-05-23T17:58:28Z)
GLaD: Synergizing Molecular Graphs and Language Descriptors for Enhanced Power Conversion Efficiency Prediction in Organic Photovoltaic Devices [43.511428925893675]
This paper presents a novel approach for predicting Power Conversion Efficiency (PCE) of Organic Photovoltaic (OPV) devices, called GLaD: synergizing molecular Graphs and Language Descriptors. We collect a dataset consisting of 500 pairs of OPV donor and acceptor molecules along with their corresponding PCE values, which we utilize as the training data for our predictive model. GLaD achieves precise predictions of PCE, thereby facilitating the synthesis of new OPV molecules with improved efficiency.
arXiv Detail & Related papers (2024-05-23T06:02:07Z)
Accelerating Molecular Graph Neural Networks via Knowledge Distillation [1.9116784879310031]
Recent advances in graph neural networks (GNNs) have enabled more comprehensive modeling of molecules and molecular systems. As the field has been progressing to bigger and more complex architectures, state-of-the-art GNNs have become largely prohibitive for many large-scale applications. We devise KD strategies that facilitate the distillation of hidden representations in directional and equivariant GNNs, and evaluate their performance on the regression task of energy and force prediction.
arXiv Detail & Related papers (2023-06-26T16:24:31Z)
Retrieval-based Controllable Molecule Generation [63.44583084888342]
We propose a new retrieval-based framework for controllable molecule generation. We use a small set of molecules to steer the pre-trained generative model towards synthesizing molecules that satisfy the given design criteria. Our approach is agnostic to the choice of generative models and requires no task-specific fine-tuning.
arXiv Detail & Related papers (2022-08-23T17:01:16Z)
Deceive D: Adaptive Pseudo Augmentation for GAN Training with Limited Data [125.7135706352493]
Generative adversarial networks (GANs) typically require ample data for training in order to synthesize high-fidelity images. Recent studies have shown that training GANs with limited data remains formidable due to discriminator overfitting. This paper introduces a novel strategy called Adaptive Pseudo Augmentation (APA) to encourage healthy competition between the generator and the discriminator.
arXiv Detail & Related papers (2021-11-12T18:13:45Z)
Energy-based View of Retrosynthesis [70.66156081030766]
We propose a framework that unifies sequence- and graph-based methods as energy-based models. We present a novel dual variant within the framework that performs consistent training over Bayesian forward- and backward-prediction. This model improves state-of-the-art performance by 9.6% for template-free approaches where the reaction type is unknown.
arXiv Detail & Related papers (2020-07-14T18:51:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.