GenLens: A Systematic Evaluation of Visual GenAI Model Outputs
- URL: http://arxiv.org/abs/2402.03700v1
- Date: Tue, 6 Feb 2024 04:41:06 GMT
- Title: GenLens: A Systematic Evaluation of Visual GenAI Model Outputs
- Authors: Tica Lin, Hanspeter Pfister, Jui-Hsien Wang
- Abstract summary: GenLens is a visual analytic interface designed for the systematic evaluation of GenAI model outputs.
A user study with model developers reveals that GenLens effectively enhances their workflow, evidenced by high satisfaction rates.
- Score: 33.93591473459988
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The rapid development of generative AI (GenAI) models in computer vision
necessitates effective evaluation methods to ensure their quality and fairness.
Existing tools primarily focus on dataset quality assurance and model
explainability, leaving a significant gap in GenAI output evaluation during
model development. Current practices often depend on developers' subjective
visual assessments, which may lack scalability and generalizability. This paper
bridges this gap by conducting a formative study with GenAI model developers in
an industrial setting. Our findings led to the development of GenLens, a visual
analytic interface designed for the systematic evaluation of GenAI model
outputs during the early stages of model development. GenLens offers a
quantifiable approach for overviewing and annotating failure cases, customizing
issue tags and classifications, and aggregating annotations from multiple users
to enhance collaboration. A user study with model developers reveals that
GenLens effectively enhances their workflow, evidenced by high satisfaction
rates and a strong intent to integrate it into their practices. This research
underscores the importance of robust early-stage evaluation tools in GenAI
development, contributing to the advancement of fair and high-quality GenAI
models.
Related papers
- Dimensions of Generative AI Evaluation Design [51.541816010127256]
We propose a set of general dimensions that capture critical choices involved in GenAI evaluation design.
These dimensions include the evaluation setting, the task type, the input source, the interaction style, the duration, the metric type, and the scoring method.
arXiv Detail & Related papers (2024-11-19T18:25:30Z) - Recommendation with Generative Models [35.029116616023586]
Generative models are AI models capable of creating new instances of data by learning and sampling from their statistical distributions.
These models have applications across various domains, such as image generation, text synthesis, and music composition.
In recommender systems, generative models, referred to as Gen-RecSys, improve the accuracy and diversity of recommendations.
arXiv Detail & Related papers (2024-09-18T18:29:15Z) - Case Study: Leveraging GenAI to Build AI-based Surrogates and Regressors for Modeling Radio Frequency Heating in Fusion Energy Science [30.658306142871602]
This work presents a detailed case study on using Generative AI (GenAI) to develop AI surrogates for simulation models in fusion energy research.
The scope includes the methodology, implementation, and results of using GenAI to assist in model development and optimization.
arXiv Detail & Related papers (2024-09-10T00:22:19Z) - On the Limitations and Prospects of Machine Unlearning for Generative AI [7.795648142175443]
Generative AI (GenAI) aims to synthesize realistic and diverse data samples from latent variables or other data modalities.
GenAI has achieved remarkable results in various domains, such as natural language, images, audio, and graphs.
However, they also pose challenges and risks to data privacy, security, and ethics.
arXiv Detail & Related papers (2024-08-01T08:35:40Z) - Model-based Maintenance and Evolution with GenAI: A Look into the Future [47.93555901495955]
We argue that Generative Artificial Intelligence (GenAI) can be used as a means to address the limitations of Model-Based Engineering (MBM&E)
We propose that GenAI can be used in MBM&E for: reducing engineers' learning curve, maximizing efficiency with recommendations, or serving as a reasoning tool to understand domain problems.
arXiv Detail & Related papers (2024-07-09T23:13:26Z) - GenBench: A Benchmarking Suite for Systematic Evaluation of Genomic Foundation Models [56.63218531256961]
We introduce GenBench, a benchmarking suite specifically tailored for evaluating the efficacy of Genomic Foundation Models.
GenBench offers a modular and expandable framework that encapsulates a variety of state-of-the-art methodologies.
We provide a nuanced analysis of the interplay between model architecture and dataset characteristics on task-specific performance.
arXiv Detail & Related papers (2024-06-01T08:01:05Z) - Generative AI for Visualization: State of the Art and Future Directions [7.273704442256712]
This paper looks back on previous visualization studies leveraging GenAI.
By summarizing different generation algorithms, their current applications and limitations, this paper endeavors to provide useful insights for future GenAI4VIS research.
arXiv Detail & Related papers (2024-04-28T11:27:30Z) - Generative AI and Process Systems Engineering: The Next Frontier [0.5937280131734116]
This article explores how emerging generative artificial intelligence (GenAI) models, such as large language models (LLMs), can enhance solution methodologies within process systems engineering (PSE)
These cutting-edge GenAI models, particularly foundation models (FMs), are pre-trained on extensive, general-purpose datasets.
The article identifies and discusses potential challenges in fully leveraging GenAI within PSE, including multiscale modeling, data requirements, evaluation metrics and benchmarks, and trust and safety.
arXiv Detail & Related papers (2024-02-15T18:20:42Z) - A Comprehensive Survey of AI-Generated Content (AIGC): A History of
Generative AI from GAN to ChatGPT [63.58711128819828]
ChatGPT and other Generative AI (GAI) techniques belong to the category of Artificial Intelligence Generated Content (AIGC)
The goal of AIGC is to make the content creation process more efficient and accessible, allowing for the production of high-quality content at a faster pace.
arXiv Detail & Related papers (2023-03-07T20:36:13Z) - Rethinking Generalization of Neural Models: A Named Entity Recognition
Case Study [81.11161697133095]
We take the NER task as a testbed to analyze the generalization behavior of existing models from different perspectives.
Experiments with in-depth analyses diagnose the bottleneck of existing neural NER models.
As a by-product of this paper, we have open-sourced a project that involves a comprehensive summary of recent NER papers.
arXiv Detail & Related papers (2020-01-12T04:33:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.