CycleChemist: A Dual-Pronged Machine Learning Framework for Organic Photovoltaic Discovery
- URL: http://arxiv.org/abs/2511.19500v1
- Date: Sun, 23 Nov 2025 16:31:11 GMT
- Title: CycleChemist: A Dual-Pronged Machine Learning Framework for Organic Photovoltaic Discovery
- Authors: Hou Hei Lam, Jiangjie Qiu, Xiuyuan Hu, Wentao Li, Fankun Zeng, Siwei Fu, Hao Zhang, Xiaonan Wang,
- Abstract summary: We introduce a dual machine learning framework for OPV discovery that combines predictive modeling with generative molecular design.<n>We present the Organic Photovoltaic Donor Acceptor dataset (OPV2D), the largest curated dataset of its kind, containing 2000 experimentally characterized donor acceptor pairs.<n>This framework includes the Orbital Energy Estor (MOE2) for predicting HOMO and LUMO energy levels, and the Photovoltaic Performance Predictor (P3) for estimating PCE.
- Score: 12.514751935736108
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Organic photovoltaic (OPV) materials offer a promising path toward sustainable energy generation, but their development is limited by the difficulty of identifying high performance donor and acceptor pairs with strong power conversion efficiencies (PCEs). Existing design strategies typically focus on either the donor or the acceptor alone, rather than using a unified approach capable of modeling both components. In this work, we introduce a dual machine learning framework for OPV discovery that combines predictive modeling with generative molecular design. We present the Organic Photovoltaic Donor Acceptor Dataset (OPV2D), the largest curated dataset of its kind, containing 2000 experimentally characterized donor acceptor pairs. Using this dataset, we develop the Organic Photovoltaic Classifier (OPVC) to predict whether a material exhibits OPV behavior, and a hierarchical graph neural network that incorporates multi task learning and donor acceptor interaction modeling. This framework includes the Molecular Orbital Energy Estimator (MOE2) for predicting HOMO and LUMO energy levels, and the Photovoltaic Performance Predictor (P3) for estimating PCE. In addition, we introduce the Material Generative Pretrained Transformer (MatGPT) to produce synthetically accessible organic semiconductors, guided by a reinforcement learning strategy with three objective policy optimization. By linking molecular representation learning with performance prediction, our framework advances data driven discovery of high performance OPV materials.
Related papers
- Zatom-1: A Multimodal Flow Foundation Model for 3D Molecules and Materials [51.342983349686556]
General-purpose 3D chemical modeling encompasses molecules and materials, requiring both generative and predictive capabilities.<n>We introduce Zatom-1, the first end-to-end, fully open-source foundation model that unifies generative and predictive learning of 3D molecules and materials.
arXiv Detail & Related papers (2026-02-24T20:52:39Z) - PeroMAS: A Multi-agent System of Perovskite Material Discovery [51.859972927223936]
Perovskite Solar Cells (PSCs) are renowned for their superior optoelectronic performance and cost potential.<n>Existing AI approaches focus predominantly on discrete models, including material design, process optimization, and property prediction.<n>We propose a multi-agent system for perovskite material discovery, named PeroMAS.
arXiv Detail & Related papers (2026-02-10T09:33:06Z) - Solar-GECO: Perovskite Solar Cell Property Prediction with Geometric-Aware Co-Attention [5.680630061642918]
Perovskite solar cells are promising candidates for next-generation photovoltaics.<n>Their performance as multi-scale devices is determined by complex interactions between their constituent layers.<n>We propose to predict perovskite solar cell power conversion efficiency with a geometric-aware co-attention model.
arXiv Detail & Related papers (2025-11-24T16:15:41Z) - Accelerating High-Efficiency Organic Photovoltaic Discovery via Pretrained Graph Neural Networks and Generative Reinforcement Learning [8.898093296126603]
We propose a framework that integrates large-scale pretraining of graph neural networks (GNNs) with a GPT-2-based reinforcement learning (RL) strategy to design OPV molecules with potentially high PCE.<n>This approach produces candidate molecules with predicted efficiencies approaching 21%, although further experimental validation is required.<n>We are building the largest open-source OPV dataset to date, expected to include nearly 3,000 donor-acceptor pairs.
arXiv Detail & Related papers (2025-03-31T06:31:15Z) - Improving Molecular Modeling with Geometric GNNs: an Empirical Study [56.52346265722167]
This paper focuses on the impact of different canonicalization methods, (2) graph creation strategies, and (3) auxiliary tasks, on performance, scalability and symmetry enforcement.
Our findings and insights aim to guide researchers in selecting optimal modeling components for molecular modeling tasks.
arXiv Detail & Related papers (2024-07-11T09:04:12Z) - GLaD: Synergizing Molecular Graphs and Language Descriptors for Enhanced Power Conversion Efficiency Prediction in Organic Photovoltaic Devices [43.511428925893675]
This paper presents a novel approach for predicting Power Conversion Efficiency (PCE) of Organic Photovoltaic (OPV) devices, called GLaD: synergizing molecular Graphs and Language Descriptors.
We collect a dataset consisting of 500 pairs of OPV donor and acceptor molecules along with their corresponding PCE values, which we utilize as the training data for our predictive model.
GLaD achieves precise predictions of PCE, thereby facilitating the synthesis of new OPV molecules with improved efficiency.
arXiv Detail & Related papers (2024-05-23T06:02:07Z) - Molecule Design by Latent Prompt Transformer [76.2112075557233]
This work explores the challenging problem of molecule design by framing it as a conditional generative modeling task.
We propose a novel generative model comprising three components: (1) a latent vector with a learnable prior distribution; (2) a molecule generation model based on a causal Transformer, which uses the latent vector as a prompt; and (3) a property prediction model that predicts a molecule's target properties and/or constraint values using the latent prompt.
arXiv Detail & Related papers (2024-02-27T03:33:23Z) - A Comparative Study of Machine Learning Algorithms for Anomaly Detection
in Industrial Environments: Performance and Environmental Impact [62.997667081978825]
This study seeks to address the demands of high-performance machine learning models with environmental sustainability.
Traditional machine learning algorithms, such as Decision Trees and Random Forests, demonstrate robust efficiency and performance.
However, superior outcomes were obtained with optimised configurations, albeit with a commensurate increase in resource consumption.
arXiv Detail & Related papers (2023-07-01T15:18:00Z) - Graph Machine Learning for Design of High-Octane Fuels [47.43758223690195]
Computer-aided molecular design (CAMD) can identify molecules with desired autoignition properties.
We propose a modular graph-ML CAMD framework that integrates generative graph-ML models with graph neural networks and optimization.
We experimentally investigate and use to illustrate the need for further auto-ignition training data.
arXiv Detail & Related papers (2022-06-01T16:43:04Z) - 3D pride without 2D prejudice: Bias-controlled multi-level generative
models for structure-based ligand design [1.978587235008588]
Data sparsity and bias are two main roadblocks to the development of 3D-aware models.
We propose a first-in-kind training protocol based on multi-level contrastive learning for improved bias control and data efficiency.
arXiv Detail & Related papers (2022-04-22T12:23:59Z) - Energy-based View of Retrosynthesis [70.66156081030766]
We propose a framework that unifies sequence- and graph-based methods as energy-based models.
We present a novel dual variant within the framework that performs consistent training over Bayesian forward- and backward-prediction.
This model improves state-of-the-art performance by 9.6% for template-free approaches where the reaction type is unknown.
arXiv Detail & Related papers (2020-07-14T18:51:06Z) - Machine Learning Enabled Discovery of Application Dependent Design
Principles for Two-dimensional Materials [1.1470070927586016]
We train an ensemble of models to predict thermodynamic, mechanical, and electronic properties.
We carry out a screening of nearly 45,000 structures for two largely disjoint applications.
We find that hybrid organic-inorganic perovskites with lead and tin tend to be good candidates for solar cell applications.
arXiv Detail & Related papers (2020-03-19T23:13:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.