Related papers: MADE: Benchmark Environments for Closed-Loop Materials Discovery

MADE: Benchmark Environments for Closed-Loop Materials Discovery

URL: http://arxiv.org/abs/2601.20996v1
Date: Wed, 28 Jan 2026 19:46:46 GMT
Title: MADE: Benchmark Environments for Closed-Loop Materials Discovery
Authors: Shreshth A Malik, Tiarnan Doherty, Panagiotis Tigas, Muhammed Razzak, Stephen J. Roberts, Aron Walsh, Yarin Gal,
Abstract summary: We introduce MAterials Discovery Environments (MADE), a novel framework for benchmarking end-to-end autonomous materials discovery pipelines.<n>We formalize discovery as a search for thermodynamically stable compounds relative to a given convex hull, and evaluate efficacy and efficiency via comparison to baseline algorithms.<n>We demonstrate this by conducting systematic experiments across a family of systems, enabling ablation of components in discovery pipelines, and comparison of how methods scale with system complexity.
Score: 26.575933886112157
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Existing benchmarks for computational materials discovery primarily evaluate static predictive tasks or isolated computational sub-tasks. While valuable, these evaluations neglect the inherently iterative and adaptive nature of scientific discovery. We introduce MAterials Discovery Environments (MADE), a novel framework for benchmarking end-to-end autonomous materials discovery pipelines. MADE simulates closed-loop discovery campaigns in which an agent or algorithm proposes, evaluates, and refines candidate materials under a constrained oracle budget, capturing the sequential and resource-limited nature of real discovery workflows. We formalize discovery as a search for thermodynamically stable compounds relative to a given convex hull, and evaluate efficacy and efficiency via comparison to baseline algorithms. The framework is flexible; users can compose discovery agents from interchangeable components such as generative models, filters, and planners, enabling the study of arbitrary workflows ranging from fixed pipelines to fully agentic systems with tool use and adaptive decision making. We demonstrate this by conducting systematic experiments across a family of systems, enabling ablation of components in discovery pipelines, and comparison of how methods scale with system complexity.

Related papers

Towards Agentic Intelligence for Materials Science [73.4576385477731]
This survey advances a unique pipeline-centric view that spans from corpus curation and pretraining to goal-conditioned agents interfacing with simulation and experimental platforms.<n>To bridge communities and establish a shared frame of reference, we first present an integrated lens that aligns terminology, evaluation, and workflow stages across AI and materials science.
arXiv Detail & Related papers (2026-01-29T23:48:43Z)
ScholarGym: Benchmarking Deep Research Workflows on Academic Literature Retrieval [11.41528830724814]
We present ScholarGym, a simulation environment for reproducible evaluation of deep research on academic literature.<n>Built on a static corpus of 570K papers with deterministic retrieval, ScholarGym provides 2,536 queries with expert-annotated ground truth.
arXiv Detail & Related papers (2026-01-29T12:51:44Z)
Ontology-aligned structuring and reuse of multimodal materials data and workflows towards automatic reproduction [1.4658400971135652]
Existing text-mining approaches are insufficient to extract complete computational with associated parameters.<n>A large language model (LLM)-assisted framework is introduced for the automated extraction and structuring of computational density from the literature.<n>The framework provides a foundation for organizing and contextualizing published results in a semantically interoperable form, thereby improving transparency and reusability of computational materials data.
arXiv Detail & Related papers (2026-01-18T20:51:23Z)
Equation discovery framework EPDE: Towards a better equation discovery [50.79602839359522]
We enhance the EPDE algorithm -- an evolutionary optimization-based discovery framework.<n>Our approach generates terms using fundamental building blocks such as elementary functions and individual differentials.<n>We validate our algorithm's noise resilience and overall performance by comparing its results with those from the state-of-the-art equation discovery framework SINDy.
arXiv Detail & Related papers (2024-12-28T15:58:44Z)
Generating Diverse Synthetic Datasets for Evaluation of Real-life Recommender Systems [0.0]
Synthetic datasets are important for evaluating and testing machine learning models.<n>We develop a novel framework for generating synthetic datasets that are diverse and statistically coherent.<n>The framework is available as a free open Python package to facilitate research with minimal friction.
arXiv Detail & Related papers (2024-11-27T09:53:14Z)
DiscoveryBench: Towards Data-Driven Discovery with Large Language Models [50.36636396660163]
We present DiscoveryBench, the first comprehensive benchmark that formalizes the multi-step process of data-driven discovery. Our benchmark contains 264 tasks collected across 6 diverse domains, such as sociology and engineering. Our benchmark, thus, illustrates the challenges in autonomous data-driven discovery and serves as a valuable resource for the community to make progress.
arXiv Detail & Related papers (2024-07-01T18:58:22Z)
BEACON: A Bayesian Optimization Strategy for Novelty Search in Expensive Black-Box Systems [1.204357447396532]
Novelty search (NS) seeks to uncover diverse system behaviors through simulations or experiments.<n>NS methods typically rely on evolutionary strategies and other meta-heuristics that require dense sampling of the input space.<n>We introduce BEACON, a sample-efficient, Bayesian optimization-inspired approach to NS that is tailored for settings where the input-to-behavior relationship is opaque and costly to evaluate.
arXiv Detail & Related papers (2024-06-05T20:23:52Z)
Automating the Discovery of Partial Differential Equations in Dynamical Systems [0.0]
We present an extension to the ARGOS framework, ARGOS-RAL, which leverages sparse regression with the recurrent adaptive lasso to identify PDEs automatically. We rigorously evaluate the performance of ARGOS-RAL in identifying canonical PDEs under various noise levels and sample sizes. Our results show that ARGOS-RAL effectively and reliably identifies the underlying PDEs from data, outperforming the sequential threshold ridge regression method in most cases.
arXiv Detail & Related papers (2024-04-25T09:23:03Z)
Maximize to Explore: One Objective Function Fusing Estimation, Planning, and Exploration [87.53543137162488]
We propose an easy-to-implement online reinforcement learning (online RL) framework called textttMEX. textttMEX integrates estimation and planning components while balancing exploration exploitation automatically. It can outperform baselines by a stable margin in various MuJoCo environments with sparse rewards.
arXiv Detail & Related papers (2023-05-29T17:25:26Z)
Optimization of a Hydrodynamic Computational Reservoir through Evolution [58.720142291102135]
We interface with a model of a hydrodynamic system, under development by a startup, as a computational reservoir. We optimized the readout times and how inputs are mapped to the wave amplitude or frequency using an evolutionary search algorithm. Applying evolutionary methods to this reservoir system substantially improved separability on an XNOR task, in comparison to implementations with hand-selected parameters.
arXiv Detail & Related papers (2023-04-20T19:15:02Z)
Validation Diagnostics for SBI algorithms based on Normalizing Flows [55.41644538483948]
This work proposes easy to interpret validation diagnostics for multi-dimensional conditional (posterior) density estimators based on NF. It also offers theoretical guarantees based on results of local consistency. This work should help the design of better specified models or drive the development of novel SBI-algorithms.
arXiv Detail & Related papers (2022-11-17T15:48:06Z)
Automated Evolutionary Approach for the Design of Composite Machine Learning Pipelines [48.7576911714538]
The proposed approach is aimed to automate the design of composite machine learning pipelines. It designs the pipelines with a customizable graph-based structure, analyzes the obtained results, and reproduces them. The software implementation on this approach is presented as an open-source framework.
arXiv Detail & Related papers (2021-06-26T23:19:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.