What does a platypus look like? Generating customized prompts for
zero-shot image classification
- URL: http://arxiv.org/abs/2209.03320v3
- Date: Sun, 3 Dec 2023 22:44:05 GMT
- Title: What does a platypus look like? Generating customized prompts for
zero-shot image classification
- Authors: Sarah Pratt, Ian Covert, Rosanne Liu, Ali Farhadi
- Abstract summary: This work introduces a simple method to generate higher accuracy prompts without relying on any explicit knowledge of the task domain.
We leverage the knowledge contained in large language models (LLMs) to generate many descriptive sentences that contain important discriminating characteristics of the image categories.
This approach improves accuracy on a range of zero-shot image classification benchmarks, including over one percentage point gain on ImageNet.
- Score: 52.92839995002636
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Open-vocabulary models are a promising new paradigm for image classification.
Unlike traditional classification models, open-vocabulary models classify among
any arbitrary set of categories specified with natural language during
inference. This natural language, called "prompts", typically consists of a set
of hand-written templates (e.g., "a photo of a {}") which are completed with
each of the category names. This work introduces a simple method to generate
higher accuracy prompts, without relying on any explicit knowledge of the task
domain and with far fewer hand-constructed sentences. To achieve this, we
combine open-vocabulary models with large language models (LLMs) to create
Customized Prompts via Language models (CuPL, pronounced "couple"). In
particular, we leverage the knowledge contained in LLMs in order to generate
many descriptive sentences that contain important discriminating
characteristics of the image categories. This allows the model to place a
greater importance on these regions in the image when making predictions. We
find that this straightforward and general approach improves accuracy on a
range of zero-shot image classification benchmarks, including over one
percentage point gain on ImageNet. Finally, this simple baseline requires no
additional training and remains completely zero-shot. Code available at
https://github.com/sarahpratt/CuPL.
Related papers
- Language Models for Text Classification: Is In-Context Learning Enough? [54.869097980761595]
Recent foundational language models have shown state-of-the-art performance in many NLP tasks in zero- and few-shot settings.
An advantage of these models over more standard approaches is the ability to understand instructions written in natural language (prompts)
This makes them suitable for addressing text classification problems for domains with limited amounts of annotated instances.
arXiv Detail & Related papers (2024-03-26T12:47:39Z) - ITI-GEN: Inclusive Text-to-Image Generation [56.72212367905351]
This study investigates inclusive text-to-image generative models that generate images based on human-written prompts.
We show that, for some attributes, images can represent concepts more expressively than text.
We propose a novel approach, ITI-GEN, that leverages readily available reference images for Inclusive Text-to-Image GENeration.
arXiv Detail & Related papers (2023-09-11T15:54:30Z) - GIST: Generating Image-Specific Text for Fine-grained Object
Classification [8.118079247462425]
GIST is a method for generating image-specific fine-grained text descriptions from image-only datasets.
Our method achieves an average improvement of $4.1%$ in accuracy over CLIP linear probes.
arXiv Detail & Related papers (2023-07-21T02:47:18Z) - Text Descriptions are Compressive and Invariant Representations for
Visual Learning [63.3464863723631]
We show that an alternative approach, in line with humans' understanding of multiple visual features per class, can provide compelling performance in the robust few-shot learning setting.
In particular, we introduce a novel method, textit SLR-AVD (Sparse Logistic Regression using Augmented Visual Descriptors).
This method first automatically generates multiple visual descriptions of each class via a large language model (LLM), then uses a VLM to translate these descriptions to a set of visual feature embeddings of each image, and finally uses sparse logistic regression to select a relevant subset of these features to classify
arXiv Detail & Related papers (2023-07-10T03:06:45Z) - Freestyle Layout-to-Image Synthesis [42.64485133926378]
In this work, we explore the freestyle capability of the model, i.e., how far can it generate unseen semantics onto a given layout.
Inspired by this, we opt to leverage large-scale pre-trained text-to-image diffusion models to achieve the generation of unseen semantics.
The proposed diffusion network produces realistic and freestyle layout-to-image generation results with diverse text inputs.
arXiv Detail & Related papers (2023-03-25T09:37:41Z) - Text2Model: Text-based Model Induction for Zero-shot Image Classification [38.704831945753284]
We address the challenge of building task-agnostic classifiers using only text descriptions.
We generate zero-shot classifiers using a hypernetwork that receives class descriptions and outputs a multi-class model.
We evaluate this approach in a series of zero-shot classification tasks, for image, point-cloud, and action recognition, using a range of text descriptions.
arXiv Detail & Related papers (2022-10-27T05:19:55Z) - One-bit Supervision for Image Classification [121.87598671087494]
One-bit supervision is a novel setting of learning from incomplete annotations.
We propose a multi-stage training paradigm which incorporates negative label suppression into an off-the-shelf semi-supervised learning algorithm.
arXiv Detail & Related papers (2020-09-14T03:06:23Z) - Revisiting Pose-Normalization for Fine-Grained Few-Shot Recognition [46.15360203412185]
Few-shot, fine-grained classification requires a model to learn subtle, fine-grained distinctions between different classes.
A solution is to use pose-normalized representations.
We show that they are extremely effective for few-shot fine-grained classification.
arXiv Detail & Related papers (2020-04-01T21:00:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.