Related papers: LeMat-GenBench: A Unified Evaluation Framework for Crystal Generative Models

LeMat-GenBench: A Unified Evaluation Framework for Crystal Generative Models

URL: http://arxiv.org/abs/2512.04562v1
Date: Thu, 04 Dec 2025 08:25:16 GMT
Title: LeMat-GenBench: A Unified Evaluation Framework for Crystal Generative Models
Authors: Siddharth Betala, Samuel P. Gleason, Ali Ramlaoui, Andy Xu, Georgia Channing, Daniel Levy, Clémentine Fourrier, Nikita Kazeev, Chaitanya K. Joshi, Sékou-Oumar Kaba, Félix Therrien, Alex Hernandez-Garcia, Rocío Mercado, N. M. Anoop Krishnan, Alexandre Duval,
Abstract summary: We introduce LeMat-GenBench, a unified benchmark for generative models of crystalline materials.<n>We release an open-source evaluation suite and a public leaderboard on Hugging Face, and benchmark 12 recent generative models.
Score: 39.63407613127808
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Generative machine learning (ML) models hold great promise for accelerating materials discovery through the inverse design of inorganic crystals, enabling an unprecedented exploration of chemical space. Yet, the lack of standardized evaluation frameworks makes it challenging to evaluate, compare, and further develop these ML models meaningfully. In this work, we introduce LeMat-GenBench, a unified benchmark for generative models of crystalline materials, supported by a set of evaluation metrics designed to better inform model development and downstream applications. We release both an open-source evaluation suite and a public leaderboard on Hugging Face, and benchmark 12 recent generative models. Results reveal that an increase in stability leads to a decrease in novelty and diversity on average, with no model excelling across all dimensions. Altogether, LeMat-GenBench establishes a reproducible and extensible foundation for fair model comparison and aims to guide the development of more reliable, discovery-oriented generative models for crystalline materials.

Related papers

Transport Novelty Distance: A Distributional Metric for Evaluating Material Generative Models [2.5779675962411654]
We introduce the Transport Novelty Distance (TNovD) to judge generative models used for materials discovery jointly by the quality and novelty of the generated materials.<n>Based on ideas from Optimal Transport theory, TNovD uses a coupling between the features of the training and generated sets, which is refined into a quality and memorization regime by a threshold.<n>We evaluate our proposed metric on typical toy experiments relevant for crystal structure prediction, including memorization, noise injection and lattice deformations.
arXiv Detail & Related papers (2025-12-10T10:38:58Z)
CrystalFormer-RL: Reinforcement Fine-Tuning for Materials Design [2.290956583394892]
We explore the applications of reinforcement fine-tuning to the autoregressive transformer-based materials generative model CrystalFormer.<n>By optimizing reward signals, fine-tuning infuses knowledge from discriminative models into generative models.<n>The resulting model, CrystalFormer-RL, shows enhanced stability in generated crystals and successfully discovers crystals with desirable yet conflicting material properties.
arXiv Detail & Related papers (2025-04-03T07:59:30Z)
Exploring Model Kinship for Merging Large Language Models [73.98345036483299]
We study model evolution through iterative merging, drawing an analogy to biological evolution.<n>We show that model kinship is closely linked to the performance improvements achieved by merging.<n>We propose a new model merging strategy: Top-k Greedy Merging with Model Kinship.
arXiv Detail & Related papers (2024-10-16T14:29:29Z)
PerturBench: Benchmarking Machine Learning Models for Cellular Perturbation Analysis [8.785345412061792]
We introduce a comprehensive framework for modeling single cell transcriptomic responses to perturbations.<n>Our approach includes a modular and user-friendly model development and evaluation platform.<n>We highlight the limitations of widely used models, such as mode collapse.
arXiv Detail & Related papers (2024-08-20T07:40:20Z)
Data-Juicer Sandbox: A Feedback-Driven Suite for Multimodal Data-Model Co-development [67.55944651679864]
We present a new sandbox suite tailored for integrated data-model co-development.<n>This sandbox provides a feedback-driven experimental platform, enabling cost-effective and guided refinement of both data and models.
arXiv Detail & Related papers (2024-07-16T14:40:07Z)
RewardBench: Evaluating Reward Models for Language Modeling [100.28366840977966]
We present RewardBench, a benchmark dataset and code-base for evaluation of reward models. The dataset is a collection of prompt-chosen-rejected trios spanning chat, reasoning, and safety. On the RewardBench leaderboard, we evaluate reward models trained with a variety of methods.
arXiv Detail & Related papers (2024-03-20T17:49:54Z)
QualEval: Qualitative Evaluation for Model Improvement [82.73561470966658]
We propose QualEval, which augments quantitative scalar metrics with automated qualitative evaluation as a vehicle for model improvement. QualEval uses a powerful LLM reasoner and our novel flexible linear programming solver to generate human-readable insights. We demonstrate that leveraging its insights, for example, improves the absolute performance of the Llama 2 model by up to 15% points relative.
arXiv Detail & Related papers (2023-11-06T00:21:44Z)
Scalable Diffusion for Materials Generation [99.71001883652211]
We develop a unified crystal representation that can represent any crystal structure (UniMat) UniMat can generate high fidelity crystal structures from larger and more complex chemical systems. We propose additional metrics for evaluating generative models of materials.
arXiv Detail & Related papers (2023-10-18T15:49:39Z)
Evaluating the diversity and utility of materials proposed by generative models [38.85523285991743]
We show how one state-of-the-art generative model, the physics-guided crystal generation model, can be used as part of the inverse design process. Our findings suggest how generative models might be improved to enable better inverse design.
arXiv Detail & Related papers (2023-08-09T14:42:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.