Related papers: Open Polymer Challenge: Post-Competition Report

Open Polymer Challenge: Post-Competition Report

URL: http://arxiv.org/abs/2512.08896v1
Date: Tue, 09 Dec 2025 18:38:15 GMT
Title: Open Polymer Challenge: Post-Competition Report
Authors: Gang Liu, Sobin Alosious, Subhamoy Mahajan, Eric Inae, Yihan Zhu, Yuhan Liu, Renzheng Zhang, Jiaxin Xu, Addison Howard, Ying Li, Tengfei Luo, Meng Jiang,
Abstract summary: The Open Polymer Challenge (OPC) releases the first community-developed benchmark for polymer informatics.<n>The challenge centers on multi-task polymer property prediction, a core step in virtual screening pipelines for materials discovery.<n>We release the test dataset at https://www.kaggle.com/datasets/alexliu99/neurips-open-polymer-prediction-2025-test-data.
Score: 34.36687017237976
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Machine learning (ML) offers a powerful path toward discovering sustainable polymer materials, but progress has been limited by the lack of large, high-quality, and openly accessible polymer datasets. The Open Polymer Challenge (OPC) addresses this gap by releasing the first community-developed benchmark for polymer informatics, featuring a dataset with 10K polymers and 5 properties: thermal conductivity, radius of gyration, density, fractional free volume, and glass transition temperature. The challenge centers on multi-task polymer property prediction, a core step in virtual screening pipelines for materials discovery. Participants developed models under realistic constraints that include small data, label imbalance, and heterogeneous simulation sources, using techniques such as feature-based augmentation, transfer learning, self-supervised pretraining, and targeted ensemble strategies. The competition also revealed important lessons about data preparation, distribution shifts, and cross-group simulation consistency, informing best practices for future large-scale polymer datasets. The resulting models, analysis, and released data create a new foundation for molecular AI in polymer science and are expected to accelerate the development of sustainable and energy-efficient materials. Along with the competition, we release the test dataset at https://www.kaggle.com/datasets/alexliu99/neurips-open-polymer-prediction-2025-test-data. We also release the data generation pipeline at https://github.com/sobinalosious/ADEPT, which simulates more than 25 properties, including thermal conductivity, radius of gyration, and density.

Related papers

Omics-scale polymer computational database transferable to real-world artificial intelligence applications [8.718893022299653]
PolyOmics is an omics-scale computational database generated through fully automated molecular dynamics simulation pipelines.<n>Machine learning models pretrained on PolyOmics can be efficiently fine-tuned for a wide range of real-world downstream tasks.
arXiv Detail & Related papers (2025-11-07T09:03:07Z)
FieldGen: From Teleoperated Pre-Manipulation Trajectories to Field-Guided Data Generation [60.28409233931666]
We introduce FieldGen, a field-guided data generation framework that enables scalable, diverse, and high-quality real-world data collection.<n>Experiments demonstrate that policies trained with FieldGen achieve higher success rates and improved stability compared to teleoperation-based baselines.
arXiv Detail & Related papers (2025-10-23T17:47:12Z)
POINT$^{2}$: A Polymer Informatics Training and Testing Database [15.45788515943579]
POINT$2$ (POlymer INformatics Training and Testing) is a benchmark database and protocol designed to address critical challenges in polymer informatics.<n>We develop an ensemble of ML models, including Quantile Random Forests, Multilayer Perceptrons with dropout, Graph Neural Networks, and pretrained large language models.<n>These models are coupled with diverse polymer representations such as Morgan, MACCS, RDKit, Topological, Atom Pair fingerprints, and graph-based descriptors.
arXiv Detail & Related papers (2025-03-30T15:46:01Z)
PolSAM: Polarimetric Scattering Mechanism Informed Segment Anything Model [83.35198885088093]
PolSAR data presents unique challenges due to its rich and complex characteristics.<n>Existing data representations, such as complex-valued data, polarimetric features, and amplitude images, are widely used.<n>Most feature extraction networks for PolSAR are small, limiting their ability to capture features effectively.<n>We propose the Polarimetric Scattering Mechanism-Informed SAM (PolSAM), an enhanced Segment Anything Model (SAM) that integrates domain-specific scattering characteristics and a novel prompt generation strategy.
arXiv Detail & Related papers (2024-12-17T09:59:53Z)
Evaluating Language Models as Synthetic Data Generators [99.16334775127875]
AgoraBench is a benchmark that provides standardized settings and metrics to evaluate LMs' data generation abilities.<n>Through synthesizing 1.26 million training instances using 6 LMs and training 99 student models, we uncover key insights about LMs' data generation capabilities.
arXiv Detail & Related papers (2024-12-04T19:20:32Z)
Transferring a molecular foundation model for polymer property predictions [3.067983186439152]
Self-supervised pretraining of transformer models requires large-scale datasets. We show that using transformers pretrained on small molecules and fine-tuned on polymer properties achieve comparable accuracy to those trained on augmented polymer datasets.
arXiv Detail & Related papers (2023-10-25T19:55:00Z)
ClimSim-Online: A Large Multi-scale Dataset and Framework for Hybrid ML-physics Climate Emulation [45.201929285600606]
We present ClimSim-Online, which includes an end-to-end workflow for developing hybrid ML-physics simulators. The dataset is global and spans ten years at a high sampling frequency. We provide a cross-platform, containerized pipeline to integrate ML models into operational climate simulators.
arXiv Detail & Related papers (2023-06-14T21:26:31Z)
Heterogenous Ensemble of Models for Molecular Property Prediction [55.91865861896012]
We propose a method for considering different modalities on molecules. We ensemble these models with a HuberRegressor. This yields a winning solution to the 2textsuperscriptnd edition of the OGB Large-Scale Challenge (2022)
arXiv Detail & Related papers (2022-11-20T17:25:26Z)
Copolymer Informatics with Multi-Task Deep Neural Networks [0.0]
We address the property prediction challenge for copolymers, extending the polymer informatics framework beyond homopolymers. A large data set containing over 18,000 data points of glass transition, melting, and degradation temperature of homopolymers and copolymers of up to two monomers is used. The developed models are accurate, fast, flexible, and scalable to more copolymer properties when suitable data become available.
arXiv Detail & Related papers (2021-03-25T23:28:20Z)
Polymers for Extreme Conditions Designed Using Syntax-Directed Variational Autoencoders [53.34780987686359]
Machine learning tools are now commonly employed to virtually screen material candidates with desired properties. This approach is inefficient, and severely constrained by the candidates that human imagination can conceive. We utilize syntax-directed variational autoencoders (VAE) in tandem with Gaussian process regression (GPR) models to discover polymers expected to be robust under three extreme conditions.
arXiv Detail & Related papers (2020-11-04T21:36:59Z)
Polymer Informatics: Current Status and Critical Next Steps [1.3238373064156097]
Surrogate models are trained on available polymer data for instant property prediction. Data-driven strategies to tackle unique challenges resulting from the extraordinary chemical and physical diversity of polymers at small and large scales are being explored. Methods to solve inverse problems, wherein polymer recommendations are made using advanced AI algorithms that meet application targets, are being investigated.
arXiv Detail & Related papers (2020-11-01T14:17:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.