Related papers: Beyond Statistical Similarity: Rethinking Metrics for Deep Generative Models in Engineering Design

Beyond Statistical Similarity: Rethinking Metrics for Deep Generative Models in Engineering Design

URL: http://arxiv.org/abs/2302.02913v4
Date: Sat, 14 Oct 2023 04:33:26 GMT
Title: Beyond Statistical Similarity: Rethinking Metrics for Deep Generative Models in Engineering Design
Authors: Lyle Regenwetter, Akash Srivastava, Dan Gutfreund, Faez Ahmed
Abstract summary: This paper doubles as a review and practical guide to evaluation metrics for deep generative models (DGMs) in engineering design. We first summarize the well-accepted classic' evaluation metrics for deep generative models grounded in machine learning theory. Next, we curate a set of design-specific metrics which can be used for evaluating deep generative models.
Score: 10.531935694354448
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deep generative models such as Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), Diffusion Models, and Transformers, have shown great promise in a variety of applications, including image and speech synthesis, natural language processing, and drug discovery. However, when applied to engineering design problems, evaluating the performance of these models can be challenging, as traditional statistical metrics based on likelihood may not fully capture the requirements of engineering applications. This paper doubles as a review and practical guide to evaluation metrics for deep generative models (DGMs) in engineering design. We first summarize the well-accepted `classic' evaluation metrics for deep generative models grounded in machine learning theory. Using case studies, we then highlight why these metrics seldom translate well to design problems but see frequent use due to the lack of established alternatives. Next, we curate a set of design-specific metrics which have been proposed across different research communities and can be used for evaluating deep generative models. These metrics focus on unique requirements in design and engineering, such as constraint satisfaction, functional performance, novelty, and conditioning. Throughout our discussion, we apply the metrics to models trained on simple-to-visualize 2-dimensional example problems. Finally, we evaluate four deep generative models on a bicycle frame design problem and structural topology generation problem. In particular, we showcase the use of proposed metrics to quantify performance target achievement, design novelty, and geometric constraints. We publicly release the code for the datasets, models, and metrics used throughout the paper at https://decode.mit.edu/projects/metrics/.

Related papers

Attention to Detail: Fine-Scale Feature Preservation-Oriented Geometric Pre-training for AI-Driven Surrogate Modeling [6.34618828355523]
AI-driven surrogate modeling has become an increasingly effective alternative to physics-based simulations for 3D design, analysis, and manufacturing. This work introduces a self-supervised geometric representation learning method designed to capture fine-scale geometric features from non-parametric 3D models.
arXiv Detail & Related papers (2025-04-27T17:10:13Z)
Adaptable Embeddings Network (AEN) [49.1574468325115]
We introduce Adaptable Embeddings Networks (AEN), a novel dual-encoder architecture using Kernel Density Estimation (KDE) AEN allows for runtime adaptation of classification criteria without retraining and is non-autoregressive. The architecture's ability to preprocess and cache condition embeddings makes it ideal for edge computing applications and real-time monitoring systems.
arXiv Detail & Related papers (2024-11-21T02:15:52Z)
Revisiting SMoE Language Models by Evaluating Inefficiencies with Task Specific Expert Pruning [78.72226641279863]
Sparse Mixture of Expert (SMoE) models have emerged as a scalable alternative to dense models in language modeling. Our research explores task-specific model pruning to inform decisions about designing SMoE architectures. We introduce an adaptive task-aware pruning technique UNCURL to reduce the number of experts per MoE layer in an offline manner post-training.
arXiv Detail & Related papers (2024-09-02T22:35:03Z)
Promises and Pitfalls of Generative Masked Language Modeling: Theoretical Framework and Practical Guidelines [74.42485647685272]
We focus on Generative Masked Language Models (GMLMs) We train a model to fit conditional probabilities of the data distribution via masking, which are subsequently used as inputs to a Markov Chain to draw samples from the model. We adapt the T5 model for iteratively-refined parallel decoding, achieving 2-3x speedup in machine translation with minimal sacrifice in quality.
arXiv Detail & Related papers (2024-07-22T18:00:00Z)
Comprehensive Study on Performance Evaluation and Optimization of Model Compression: Bridging Traditional Deep Learning and Large Language Models [0.0]
An increase in the number of connected devices around the world warrants compressed models that can be easily deployed at the local devices with low compute capacity and power accessibility. We implemented both, quantization and pruning, compression techniques on popular deep learning models used in the image classification, object detection, language models and generative models-based problem statements.
arXiv Detail & Related papers (2024-07-22T14:20:53Z)
OLMES: A Standard for Language Model Evaluations [64.85905119836818]
We propose OLMES, a practical, open standard for reproducible language model evaluations. We identify and review the varying factors in evaluation practices adopted by the community. OLMES supports meaningful comparisons between smaller base models that require the unnatural "cloze" formulation of multiple-choice questions.
arXiv Detail & Related papers (2024-06-12T17:37:09Z)
Learning Evaluation Models from Large Language Models for Sequence Generation [61.8421748792555]
We propose a three-stage evaluation model training method that utilizes large language models to generate labeled data for model-based metric development. Experimental results on the SummEval benchmark demonstrate that CSEM can effectively train an evaluation model without human-labeled data.
arXiv Detail & Related papers (2023-08-08T16:41:16Z)
Feature Likelihood Divergence: Evaluating the Generalization of Generative Models Using Samples [25.657798631897908]
Feature Likelihood Divergence provides a comprehensive trichotomic evaluation of generative models. We empirically demonstrate the ability of FLD to identify overfitting problem cases, even when previously proposed metrics fail.
arXiv Detail & Related papers (2023-02-09T04:57:27Z)
Design Space Exploration and Explanation via Conditional Variational Autoencoders in Meta-model-based Conceptual Design of Pedestrian Bridges [52.77024349608834]
This paper provides a performance-driven design exploration framework to augment the human designer through a Conditional Variational Autoencoder (CVAE) The CVAE is trained on 18'000 synthetically generated instances of a pedestrian bridge in Switzerland.
arXiv Detail & Related papers (2022-11-29T17:28:31Z)
Exploring and Evaluating Personalized Models for Code Generation [9.25440316608194]
We evaluate transformer model fine-tuning for personalization. We consider three key approaches: (i) custom fine-tuning, which allows all the model parameters to be tuned. We compare these fine-tuning strategies for code generation and discuss the potential generalization and cost benefits of each in various deployment scenarios.
arXiv Detail & Related papers (2022-08-29T23:28:46Z)
Towards Goal, Feasibility, and Diversity-Oriented Deep Generative Models in Design [4.091593765662773]
We present the first Deep Generative Model that simultaneously optimize for performance, feasibility, diversity, and target achievement. Methods are tested on a challenging multi-objective bicycle frame design problem with skewed, multimodal data of different datatypes.
arXiv Detail & Related papers (2022-06-14T20:57:23Z)
Design Target Achievement Index: A Differentiable Metric to Enhance Deep Generative Models in Multi-Objective Inverse Design [4.091593765662773]
Design Target Achievement Index (DTAI) is a differentiable, tunable metric that scores a design's ability to achieve designer-specified minimum performance targets. We apply DTAI to a Performance-Augmented Diverse GAN (PaDGAN) and demonstrate superior generative performance compared to a set of baseline Deep Generative Models.
arXiv Detail & Related papers (2022-05-06T04:14:34Z)
Evaluation Metrics for Conditional Image Generation [100.69766435176557]
We present two new metrics for evaluating generative models in the class-conditional image generation setting. A theoretical analysis shows the motivation behind each proposed metric and links the novel metrics to their unconditional counterparts. We provide an extensive empirical evaluation, comparing the metrics to their unconditional variants and to other metrics, and utilize them to analyze existing generative models.
arXiv Detail & Related papers (2020-04-26T12:15:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.