Robust Machine Learning Framework for Reliable Discovery of High-Performance Half-Heusler Thermoelectrics
- URL: http://arxiv.org/abs/2602.01149v1
- Date: Sun, 01 Feb 2026 10:50:42 GMT
- Title: Robust Machine Learning Framework for Reliable Discovery of High-Performance Half-Heusler Thermoelectrics
- Authors: Shoeb Athar, Adrien Mecibah, Philippe Jund,
- Abstract summary: Machine learning (ML) can facilitate efficient thermoelectric (TE) material discovery essential to address the environmental crisis.<n>ML models often suffer from poor experimental generalizability despite high metrics.<n>This study presents a robust workflow, applied to the half-Heusler (hH) structural prototype, for figure of merit (zT) prediction.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Machine learning (ML) can facilitate efficient thermoelectric (TE) material discovery essential to address the environmental crisis. However, ML models often suffer from poor experimental generalizability despite high metrics. This study presents a robust workflow, applied to the half-Heusler (hH) structural prototype, for figure of merit (zT) prediction, to improve the generalizability of ML models. To resolve challenges in dataset handling and feature filtering, we first introduce a rigorous PCA-based splitting method that ensures training and test sets are unbiased and representative of the full chemical space. We then integrate Bayesian hyperparameter optimization with k-best feature filtering across three architectures-Random Forest, XGBoost, and Neural Networks - while employing SISSO symbolic regression for physical insight and comparison. Using SHAP and SISSO analysis, we identify A-site dopant concentration (xA'), and A-site Heat of Vaporization (HVA) as the primary drivers of zT besides Temperature (T). Finally, a high-throughput screening of approximately 6.6x10^8 potential compositions, filtered by stability constraints, yielded several novel high-zT candidates. Breaking from the traditional focus of improving test RMSE/R^2 values of the models, this work shifts the attention on establishing the test set a true proxy for model generalizability and strengthening the often neglected modules of the existing ML workflows for the data-driven design of next-generation thermoelectric materials.
Related papers
- From Static Spectra to Operando Infrared Dynamics: Physics Informed Flow Modeling and a Benchmark [67.29937933325849]
Operando IR Prediction aims to forecast the time-resolved evolution of spectral fingerprints'' from a single static spectrum.<n>OpIRSpec-7K comprises 7,118 high-quality samples across 10 distinct battery systems.<n>ABCC significantly outperforms state-of-the-art static, sequential, and generative baselines.
arXiv Detail & Related papers (2026-02-20T18:58:43Z) - From Evaluation to Design: Using Potential Energy Surface Smoothness Metrics to Guide Machine Learning Interatomic Potential Architectures [12.68400434984463]
MLIPs fail to reproduce the physical smoothness of the quantum potential energy surface.<n>Existing evaluations, such as microcanonical molecular dynamics, are computationally expensive and primarily probe near-equilibrium states.<n>We introduce the Bond Smoothness Characterization Test (BSCT) to improve evaluation metrics for MLIPs.
arXiv Detail & Related papers (2026-02-04T18:50:10Z) - Improving Minimax Estimation Rates for Contaminated Mixture of Multinomial Logistic Experts via Expert Heterogeneity [49.809923981964715]
Contaminated mixture of experts (MoE) is motivated by transfer learning methods where a pre-trained model, acting as a frozen expert, is integrated with an adapter model, functioning as a trainable expert, in order to learn a new task.<n>In this work, we characterize uniform convergence rates for estimating parameters under challenging settings where ground-truth parameters vary with the sample size.<n>We also establish corresponding minimax lower bounds to ensure that these rates are minimax optimal.
arXiv Detail & Related papers (2026-01-31T23:45:50Z) - China Regional 3km Downscaling Based on Residual Corrective Diffusion Model [39.12803910865843]
This work focuses on statistical downscaling, which establishes statistical relationships between low-resolution and high-resolution historical data.<n>In contrast to the original work of CorrDiff, the region considered in this work is nearly 40 times larger.<n>Deep learning has emerged as a powerful tool for this task, giving rise to various high-performance super-resolution models.
arXiv Detail & Related papers (2025-12-05T02:27:08Z) - BITS for GAPS: Bayesian Information-Theoretic Sampling for hierarchical GAussian Process Surrogates [45.88028371034407]
We introduce the Bayesian Information-Theoretic Sampling for hierarchical GAussian Process Surrogates (BITS for GAPS) framework.<n>BITS for GAPS supports serial hybrid modeling, where known physics governs part of the system.<n>We derive entropy-based acquisition functions that quantify expected information gain from candidate input locations.
arXiv Detail & Related papers (2025-11-20T21:36:21Z) - Km-scale dynamical downscaling through conformalized latent diffusion models [45.94979929172337]
Dynamical downscaling is crucial for deriving high-resolution meteorological fields from coarse-scale simulations.<n>Generative Diffusion models (DMs) have recently emerged as powerful data-driven tools for this task.<n>However, DMs lack finite-sample guarantees against overconfident predictions, resulting in miscalibrated grid-point-level uncertainty estimates.<n>We tackle this issue by augmenting the downscaling pipeline with a conformal prediction framework.
arXiv Detail & Related papers (2025-10-15T08:41:36Z) - Fusion-Based Neural Generalization for Predicting Temperature Fields in Industrial PET Preform Heating [0.4337994560632144]
We propose a novel deep learning framework for generalized temperature prediction.<n>Unlike traditional models that require extensive retraining for each material or design variation, our method introduces a data-efficient neural architecture.<n>Our approach reduces the need for large simulation datasets while achieving superior performance compared to models trained from scratch.
arXiv Detail & Related papers (2025-10-06T21:38:37Z) - Process-Informed Forecasting of Complex Thermal Dynamics in Pharmaceutical Manufacturing [3.6138260410017797]
We introduce process-informed forecasting (PIF) models for temperature in pharmaceutical lyophilization.<n>PIF models outperform their data-driven counterparts in terms of accuracy, physical plausibility and noise resilience.
arXiv Detail & Related papers (2025-09-24T17:42:00Z) - TSGym: Design Choices for Deep Multivariate Time-Series Forecasting [38.12202305030755]
This work bridges gaps by decomposing deep MTSF methods into their core, fine-grained components.<n>We propose a novel automated solution called TSGym for MTSF tasks.<n>Extensive experiments indicate that TSGym significantly outperforms existing state-of-the-art MTSF and AutoML methods.
arXiv Detail & Related papers (2025-09-21T12:49:31Z) - QGAPHEnsemble : Combining Hybrid QLSTM Network Ensemble via Adaptive Weighting for Short Term Weather Forecasting [0.0]
This research highlights the practical efficacy of employing advanced machine learning techniques.<n>Our model demonstrates a substantial improvement in the accuracy and reliability of meteorological predictions.<n>The paper highlights the importance of optimized ensemble techniques to improve the performance the given weather forecasting task.
arXiv Detail & Related papers (2025-01-18T20:18:48Z) - On conditional diffusion models for PDE simulations [53.01911265639582]
We study score-based diffusion models for forecasting and assimilation of sparse observations.
We propose an autoregressive sampling approach that significantly improves performance in forecasting.
We also propose a new training strategy for conditional score-based models that achieves stable performance over a range of history lengths.
arXiv Detail & Related papers (2024-10-21T18:31:04Z) - AI enhanced data assimilation and uncertainty quantification applied to
Geological Carbon Storage [0.0]
We introduce the Surrogate-based hybrid ESMDA (SH-ESMDA), an adaptation of the traditional Ensemble Smoother with Multiple Data Assimilation (ESMDA)
We also introduce Surrogate-based Hybrid RML (SH-RML), a variational data assimilation approach that relies on the randomized maximum likelihood (RML)
Our comparative analyses show that SH-RML offers better uncertainty compared to conventional ESMDA for the case study.
arXiv Detail & Related papers (2024-02-09T00:24:46Z) - Bayesian tomography using polynomial chaos expansion and deep generative
networks [0.0]
We present a strategy combining the excellent reconstruction performances of a variational autoencoder (VAE) with the accuracy of PCA-PCE surrogate modeling.
Within the MCMC process, the parametrization of the VAE is leveraged for prior exploration and sample proposals.
arXiv Detail & Related papers (2023-07-09T16:44:37Z) - Prediction of liquid fuel properties using machine learning models with
Gaussian processes and probabilistic conditional generative learning [56.67751936864119]
The present work aims to construct cheap-to-compute machine learning (ML) models to act as closure equations for predicting the physical properties of alternative fuels.
Those models can be trained using the database from MD simulations and/or experimental measurements in a data-fusion-fidelity approach.
The results show that ML models can predict accurately the fuel properties of a wide range of pressure and temperature conditions.
arXiv Detail & Related papers (2021-10-18T14:43:50Z) - On Minimum Word Error Rate Training of the Hybrid Autoregressive
Transducer [40.63693071222628]
We study the minimum word error rate (MWER) training of Hybrid Autoregressive Transducer (HAT)
From experiments with around 30,000 hours of training data, we show that MWER training can improve the accuracy of HAT models.
arXiv Detail & Related papers (2020-10-23T21:16:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.