Offline Evaluation of Set-Based Text-to-Image Generation
- URL: http://arxiv.org/abs/2410.17331v1
- Date: Tue, 22 Oct 2024 18:04:00 GMT
- Title: Offline Evaluation of Set-Based Text-to-Image Generation
- Authors: Negar Arabzadeh, Fernando Diaz, Junfeng He,
- Abstract summary: Ideation is an important subclass of Text-to-Image (TTI) tasks.
Existing evaluation metrics for TTI remain focused on distributional similarity metrics.
We develop TTI evaluation metrics with explicit models of how users browse and interact with sets of spatially arranged generated images.
- Score: 55.1766769455424
- License:
- Abstract: Text-to-Image (TTI) systems often support people during ideation, the early stages of a creative process when exposure to a broad set of relevant images can help explore the design space. Since ideation is an important subclass of TTI tasks, understanding how to quantitatively evaluate TTI systems according to how well they support ideation is crucial to promoting research and development for these users. However, existing evaluation metrics for TTI remain focused on distributional similarity metrics like Fr\'echet Inception Distance (FID). We take an alternative approach and, based on established methods from ranking evaluation, develop TTI evaluation metrics with explicit models of how users browse and interact with sets of spatially arranged generated images. Our proposed offline evaluation metrics for TTI not only capture how relevant generated images are with respect to the user's ideation need but also take into consideration the diversity and arrangement of the set of generated images. We analyze our proposed family of TTI metrics using human studies on image grids generated by three different TTI systems based on subsets of the widely used benchmarks such as MS-COCO captions and Localized Narratives as well as prompts used in naturalistic settings. Our results demonstrate that grounding metrics in how people use systems is an important and understudied area of benchmark design.
Related papers
- Beyond Mask: Rethinking Guidance Types in Few-shot Segmentation [67.35274834837064]
We develop a universal vision-language framework (UniFSS) to integrate prompts from text, mask, box, and image.
UniFSS significantly outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2024-07-16T08:41:01Z) - Semantic Similarity Score for Measuring Visual Similarity at Semantic Level [5.867765921443141]
We propose a semantic evaluation metric -- SeSS (Semantic Similarity Score) based on Scene Graph Generation and graph matching.
The metric can measure the semantic-level differences in semantic-level information of images and can be used for evaluation in visual semantic communication systems.
arXiv Detail & Related papers (2024-06-06T08:51:26Z) - CrossScore: Towards Multi-View Image Evaluation and Scoring [24.853612457257697]
Cross-reference image quality assessment method fills the gap in the image assessment landscape.
Our method enables accurate image quality assessment without requiring ground truth references.
arXiv Detail & Related papers (2024-04-22T17:59:36Z) - Stellar: Systematic Evaluation of Human-Centric Personalized
Text-to-Image Methods [52.806258774051216]
We focus on text-to-image systems that input a single image of an individual and ground the generation process along with text describing the desired visual context.
We introduce a standardized dataset (Stellar) that contains personalized prompts coupled with images of individuals that is an order of magnitude larger than existing relevant datasets and where rich semantic ground-truth annotations are readily available.
We derive a simple yet efficient, personalized text-to-image baseline that does not require test-time fine-tuning for each subject and which sets quantitatively and in human trials a new SoTA.
arXiv Detail & Related papers (2023-12-11T04:47:39Z) - Efficient Discovery and Effective Evaluation of Visual Perceptual
Similarity: A Benchmark and Beyond [20.035369732786407]
We introduce the first large-scale fashion visual similarity benchmark dataset, consisting of more than 110K expert-annotated image pairs.
We propose a novel and efficient labeling procedure that can be applied to any dataset.
arXiv Detail & Related papers (2023-08-28T17:59:47Z) - SETI: Systematicity Evaluation of Textual Inference [24.156140116509064]
We propose SETI (Systematicity Evaluation of Textual Inference), a novel and comprehensive benchmark designed for evaluating pre-trained language models (PLMs)
Specifically, SETI offers three different NLI tasks and corresponding datasets to evaluate various types of systematicity in reasoning processes.
We conduct experiments of SETI on six widely used PLMs. Results show that various PLMs are able to solve unseen compositional inferences when having encountered the knowledge of how to combine primitives, with good performance.
arXiv Detail & Related papers (2023-05-24T11:35:31Z) - Positive-Augmented Contrastive Learning for Image and Video Captioning
Evaluation [47.40949434032489]
We propose a new contrastive-based evaluation metric for image captioning, namely Positive-Augmented Contrastive learning Score (PAC-S)
PAC-S unifies the learning of a contrastive visual-semantic space with the addition of generated images and text on curated data.
Experiments spanning several datasets demonstrate that our new metric achieves the highest correlation with human judgments on both images and videos.
arXiv Detail & Related papers (2023-03-21T18:03:14Z) - Remote Sensing Image Classification using Transfer Learning and
Attention Based Deep Neural Network [59.86658316440461]
We propose a deep learning based framework for RSISC, which makes use of the transfer learning technique and multihead attention scheme.
The proposed deep learning framework is evaluated on the benchmark NWPU-RESISC45 dataset and achieves the best classification accuracy of 94.7%.
arXiv Detail & Related papers (2022-06-20T10:05:38Z) - NDPNet: A novel non-linear data projection network for few-shot
fine-grained image classification [33.71025164816078]
We introduce the non-linear data projection concept into the design of metric-based fine-grained image classification architecture.
Our proposed architecture can be easily embedded into any episodic training mechanisms for end-to-end training from scratch.
arXiv Detail & Related papers (2021-06-13T13:33:09Z) - Region Comparison Network for Interpretable Few-shot Image
Classification [97.97902360117368]
Few-shot image classification has been proposed to effectively use only a limited number of labeled examples to train models for new classes.
We propose a metric learning based method named Region Comparison Network (RCN), which is able to reveal how few-shot learning works.
We also present a new way to generalize the interpretability from the level of tasks to categories.
arXiv Detail & Related papers (2020-09-08T07:29:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.