Learning to Validate Generative Models: a Goodness-of-Fit Approach
- URL: http://arxiv.org/abs/2511.09118v1
- Date: Thu, 13 Nov 2025 01:33:20 GMT
- Title: Learning to Validate Generative Models: a Goodness-of-Fit Approach
- Authors: Pietro Cappelli, Gaia Grosso, Marco Letizia, Humberto Reyes-González, Marco Zanetti,
- Abstract summary: We propose the use of the New Physics Learning Machine (NPLM) to test generative networks trained on high-dimensional scientific data.<n>NPLM is a learning based approach to goodness-of-fit testing inspired by the Neyman-Pearson construction.<n>We demonstrate that the NPLM can serve as a powerful validation method while also providing a means to diagnose sub-optimally modeled regions of the data.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generative models are increasingly central to scientific workflows, yet their systematic use and interpretation require a proper understanding of their limitations through rigorous validation. Classic approaches struggle with scalability, statistical power, or interpretability when applied to high-dimensional data, making it difficult to certify the reliability of these models in realistic, high-dimensional scientific settings. Here, we propose the use of the New Physics Learning Machine (NPLM), a learning based approach to goodness-of-fit testing inspired by the Neyman-Pearson construction, to test generative networks trained on high-dimensional scientific data. We demonstrate the performance of NPLM for validation in two benchmark cases: generative models trained on mixtures of Gaussian models with increasing dimensionality, and a public end-to-end generator for the Large Hadron Collider called FlashSim, trained on jet data, typical in the field of high-energy physics. We demonstrate that the NPLM can serve as a powerful validation method while also providing a means to diagnose sub-optimally modeled regions of the data.
Related papers
- NIMMGen: Learning Neural-Integrated Mechanistic Digital Twins with LLMs [17.66806675891691]
We introduce the Neural-Integrated Mechanistic Modeling (NIMM) evaluation framework to evaluate mechanistic models.<n>Our evaluation reveals fundamental challenges in current baselines, ranging from model effectiveness to code-level correctness.<n>We design NIMMgen, an agentic framework for neural-integrated mechanistic modeling that enhances code correctness and practical validity through iterative refinement.
arXiv Detail & Related papers (2026-02-20T05:46:54Z) - PIGPVAE: Physics-Informed Gaussian Process Variational Autoencoders [42.8983261737774]
We propose a novel generative model that learns from limited data by incorporating physical constraints to enhance performance.<n>We extend the VAE architecture by incorporating physical models in the generative process, enabling it to capture underlying dynamics more effectively.<n>We demonstrate that PIGPVAE can produce realistic samples beyond the observed distribution, highlighting its robustness and usefulness under distribution shifts.
arXiv Detail & Related papers (2025-05-25T21:12:01Z) - Exploring Training and Inference Scaling Laws in Generative Retrieval [50.82554729023865]
Generative retrieval reformulates retrieval as an autoregressive generation task, where large language models generate target documents directly from a query.<n>We systematically investigate training and inference scaling laws in generative retrieval, exploring how model size, training data scale, and inference-time compute jointly influence performance.
arXiv Detail & Related papers (2025-03-24T17:59:03Z) - On Foundation Models for Dynamical Systems from Purely Synthetic Data [5.004576576202551]
Foundation models have demonstrated remarkable generalization, data efficiency, and robustness properties across various domains.<n>These models are available in fields like natural language processing and computer vision, but do not exist for dynamical systems.<n>We address this challenge by pretraining a transformer-based foundation model exclusively on synthetic data.<n>Our results demonstrate the feasibility of foundation models for dynamical systems that outperform specialist models in terms of generalization, data efficiency, and robustness.
arXiv Detail & Related papers (2024-11-30T08:34:10Z) - Can Kans (re)discover predictive models for Direct-Drive Laser Fusion? [11.261403205522694]
The domain of laser fusion presents a unique and challenging predictive modeling application landscape for machine learning methods.
Data-driven approaches have been successful in the past for achieving desired generalization ability and model interpretation that aligns with physics expectations.
In this work, we present the use of Kolmogorov-Arnold Networks (KANs) as an alternative to PIL for developing a new type of data-driven predictive model.
arXiv Detail & Related papers (2024-09-13T13:48:06Z) - Diffusion-Based Neural Network Weights Generation [80.89706112736353]
D2NWG is a diffusion-based neural network weights generation technique that efficiently produces high-performing weights for transfer learning.
Our method extends generative hyper-representation learning to recast the latent diffusion paradigm for neural network weights generation.
Our approach is scalable to large architectures such as large language models (LLMs), overcoming the limitations of current parameter generation techniques.
arXiv Detail & Related papers (2024-02-28T08:34:23Z) - Generative Modeling of Regular and Irregular Time Series Data via Koopman VAEs [50.25683648762602]
We introduce Koopman VAE, a new generative framework that is based on a novel design for the model prior.
Inspired by Koopman theory, we represent the latent conditional prior dynamics using a linear map.
KoVAE outperforms state-of-the-art GAN and VAE methods across several challenging synthetic and real-world time series generation benchmarks.
arXiv Detail & Related papers (2023-10-04T07:14:43Z) - Calibrating constitutive models with full-field data via physics
informed neural networks [0.0]
We propose a physics-informed deep-learning framework for the discovery of model parameterizations given full-field displacement data.
We work with the weak form of the governing equations rather than the strong form to impose physical constraints upon the neural network predictions.
We demonstrate that informed machine learning is an enabling technology and may shift the paradigm of how full-field experimental data is utilized to calibrate models.
arXiv Detail & Related papers (2022-03-30T18:07:44Z) - Learning continuous models for continuous physics [94.42705784823997]
We develop a test based on numerical analysis theory to validate machine learning models for science and engineering applications.
Our results illustrate how principled numerical analysis methods can be coupled with existing ML training/testing methodologies to validate models for science and engineering applications.
arXiv Detail & Related papers (2022-02-17T07:56:46Z) - ALT-MAS: A Data-Efficient Framework for Active Testing of Machine
Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data.
The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z) - Physics-Integrated Variational Autoencoders for Robust and Interpretable
Generative Modeling [86.9726984929758]
We focus on the integration of incomplete physics models into deep generative models.
We propose a VAE architecture in which a part of the latent space is grounded by physics.
We demonstrate generative performance improvements over a set of synthetic and real-world datasets.
arXiv Detail & Related papers (2021-02-25T20:28:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.