Beyond Statistical Similarity: Rethinking Metrics for Deep Generative
Models in Engineering Design
- URL: http://arxiv.org/abs/2302.02913v4
- Date: Sat, 14 Oct 2023 04:33:26 GMT
- Title: Beyond Statistical Similarity: Rethinking Metrics for Deep Generative
Models in Engineering Design
- Authors: Lyle Regenwetter, Akash Srivastava, Dan Gutfreund, Faez Ahmed
- Abstract summary: This paper doubles as a review and practical guide to evaluation metrics for deep generative models (DGMs) in engineering design.
We first summarize the well-accepted classic' evaluation metrics for deep generative models grounded in machine learning theory.
Next, we curate a set of design-specific metrics which can be used for evaluating deep generative models.
- Score: 10.531935694354448
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep generative models such as Variational Autoencoders (VAEs), Generative
Adversarial Networks (GANs), Diffusion Models, and Transformers, have shown
great promise in a variety of applications, including image and speech
synthesis, natural language processing, and drug discovery. However, when
applied to engineering design problems, evaluating the performance of these
models can be challenging, as traditional statistical metrics based on
likelihood may not fully capture the requirements of engineering applications.
This paper doubles as a review and practical guide to evaluation metrics for
deep generative models (DGMs) in engineering design. We first summarize the
well-accepted `classic' evaluation metrics for deep generative models grounded
in machine learning theory. Using case studies, we then highlight why these
metrics seldom translate well to design problems but see frequent use due to
the lack of established alternatives. Next, we curate a set of design-specific
metrics which have been proposed across different research communities and can
be used for evaluating deep generative models. These metrics focus on unique
requirements in design and engineering, such as constraint satisfaction,
functional performance, novelty, and conditioning. Throughout our discussion,
we apply the metrics to models trained on simple-to-visualize 2-dimensional
example problems. Finally, we evaluate four deep generative models on a bicycle
frame design problem and structural topology generation problem. In particular,
we showcase the use of proposed metrics to quantify performance target
achievement, design novelty, and geometric constraints. We publicly release the
code for the datasets, models, and metrics used throughout the paper at
https://decode.mit.edu/projects/metrics/.
Related papers
- Comprehensive Study on Performance Evaluation and Optimization of Model Compression: Bridging Traditional Deep Learning and Large Language Models [0.0]
An increase in the number of connected devices around the world warrants compressed models that can be easily deployed at the local devices with low compute capacity and power accessibility.
We implemented both, quantization and pruning, compression techniques on popular deep learning models used in the image classification, object detection, language models and generative models-based problem statements.
arXiv Detail & Related papers (2024-07-22T14:20:53Z) - OLMES: A Standard for Language Model Evaluations [64.85905119836818]
We propose OLMES, a practical, open standard for reproducible language model evaluations.
We identify and review the varying factors in evaluation practices adopted by the community.
OLMES supports meaningful comparisons between smaller base models that require the unnatural "cloze" formulation of multiple-choice questions.
arXiv Detail & Related papers (2024-06-12T17:37:09Z) - EvalCrafter: Benchmarking and Evaluating Large Video Generation Models [70.19437817951673]
We argue that it is hard to judge the large conditional generative models from the simple metrics since these models are often trained on very large datasets with multi-aspect abilities.
Our approach involves generating a diverse and comprehensive list of 700 prompts for text-to-video generation.
Then, we evaluate the state-of-the-art video generative models on our carefully designed benchmark, in terms of visual qualities, content qualities, motion qualities, and text-video alignment with 17 well-selected objective metrics.
arXiv Detail & Related papers (2023-10-17T17:50:46Z) - Feature Likelihood Divergence: Evaluating the Generalization of
Generative Models Using Samples [25.657798631897908]
Feature Likelihood Divergence provides a comprehensive trichotomic evaluation of generative models.
We empirically demonstrate the ability of FLD to identify overfitting problem cases, even when previously proposed metrics fail.
arXiv Detail & Related papers (2023-02-09T04:57:27Z) - Design Space Exploration and Explanation via Conditional Variational
Autoencoders in Meta-model-based Conceptual Design of Pedestrian Bridges [52.77024349608834]
This paper provides a performance-driven design exploration framework to augment the human designer through a Conditional Variational Autoencoder (CVAE)
The CVAE is trained on 18'000 synthetically generated instances of a pedestrian bridge in Switzerland.
arXiv Detail & Related papers (2022-11-29T17:28:31Z) - Exploring and Evaluating Personalized Models for Code Generation [9.25440316608194]
We evaluate transformer model fine-tuning for personalization.
We consider three key approaches: (i) custom fine-tuning, which allows all the model parameters to be tuned.
We compare these fine-tuning strategies for code generation and discuss the potential generalization and cost benefits of each in various deployment scenarios.
arXiv Detail & Related papers (2022-08-29T23:28:46Z) - Design Automation for Fast, Lightweight, and Effective Deep Learning
Models: A Survey [53.258091735278875]
This survey covers studies of design automation techniques for deep learning models targeting edge computing.
It offers an overview and comparison of key metrics that are used commonly to quantify the proficiency of models in terms of effectiveness, lightness, and computational costs.
The survey proceeds to cover three categories of the state-of-the-art of deep model design automation techniques.
arXiv Detail & Related papers (2022-08-22T12:12:43Z) - Towards Goal, Feasibility, and Diversity-Oriented Deep Generative Models
in Design [4.091593765662773]
We present the first Deep Generative Model that simultaneously optimize for performance, feasibility, diversity, and target achievement.
Methods are tested on a challenging multi-objective bicycle frame design problem with skewed, multimodal data of different datatypes.
arXiv Detail & Related papers (2022-06-14T20:57:23Z) - Design Target Achievement Index: A Differentiable Metric to Enhance Deep
Generative Models in Multi-Objective Inverse Design [4.091593765662773]
Design Target Achievement Index (DTAI) is a differentiable, tunable metric that scores a design's ability to achieve designer-specified minimum performance targets.
We apply DTAI to a Performance-Augmented Diverse GAN (PaDGAN) and demonstrate superior generative performance compared to a set of baseline Deep Generative Models.
arXiv Detail & Related papers (2022-05-06T04:14:34Z) - How Faithful is your Synthetic Data? Sample-level Metrics for Evaluating
and Auditing Generative Models [95.8037674226622]
We introduce a 3-dimensional evaluation metric that characterizes the fidelity, diversity and generalization performance of any generative model in a domain-agnostic fashion.
Our metric unifies statistical divergence measures with precision-recall analysis, enabling sample- and distribution-level diagnoses of model fidelity and diversity.
arXiv Detail & Related papers (2021-02-17T18:25:30Z) - Evaluation Metrics for Conditional Image Generation [100.69766435176557]
We present two new metrics for evaluating generative models in the class-conditional image generation setting.
A theoretical analysis shows the motivation behind each proposed metric and links the novel metrics to their unconditional counterparts.
We provide an extensive empirical evaluation, comparing the metrics to their unconditional variants and to other metrics, and utilize them to analyze existing generative models.
arXiv Detail & Related papers (2020-04-26T12:15:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.