A Systematic Survey of Prompt Engineering on Vision-Language Foundation
Models
- URL: http://arxiv.org/abs/2307.12980v1
- Date: Mon, 24 Jul 2023 17:58:06 GMT
- Title: A Systematic Survey of Prompt Engineering on Vision-Language Foundation
Models
- Authors: Jindong Gu, Zhen Han, Shuo Chen, Ahmad Beirami, Bailan He, Gengyuan
Zhang, Ruotong Liao, Yao Qin, Volker Tresp, Philip Torr
- Abstract summary: Prompt engineering involves augmenting a large pre-trained model with task-specific hints, known as prompts, to adapt the model to new tasks.
This paper aims to provide a comprehensive survey of cutting-edge research in prompt engineering on three types of vision-language models.
- Score: 43.35892536887404
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Prompt engineering is a technique that involves augmenting a large
pre-trained model with task-specific hints, known as prompts, to adapt the
model to new tasks. Prompts can be created manually as natural language
instructions or generated automatically as either natural language instructions
or vector representations. Prompt engineering enables the ability to perform
predictions based solely on prompts without updating model parameters, and the
easier application of large pre-trained models in real-world tasks. In past
years, Prompt engineering has been well-studied in natural language processing.
Recently, it has also been intensively studied in vision-language modeling.
However, there is currently a lack of a systematic overview of prompt
engineering on pre-trained vision-language models. This paper aims to provide a
comprehensive survey of cutting-edge research in prompt engineering on three
types of vision-language models: multimodal-to-text generation models (e.g.
Flamingo), image-text matching models (e.g. CLIP), and text-to-image generation
models (e.g. Stable Diffusion). For each type of model, a brief model summary,
prompting methods, prompting-based applications, and the corresponding
responsibility and integrity issues are summarized and discussed. Furthermore,
the commonalities and differences between prompting on vision-language models,
language models, and vision models are also discussed. The challenges, future
directions, and research opportunities are summarized to foster future research
on this topic.
Related papers
- Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities [89.40778301238642]
Model merging is an efficient empowerment technique in the machine learning community.
There is a significant gap in the literature regarding a systematic and thorough review of these techniques.
arXiv Detail & Related papers (2024-08-14T16:58:48Z) - Prompt Mining for Language-based Human Mobility Forecasting [10.325794804095889]
We propose a novel framework for prompt mining in language-based mobility forecasting.
The framework includes a prompt generation stage based on the information entropy of prompts and a prompt refinement stage to integrate mechanisms such as the chain of thought.
arXiv Detail & Related papers (2024-03-06T08:43:30Z) - A Systematic Survey of Prompt Engineering in Large Language Models:
Techniques and Applications [11.568575664316143]
This paper provides a structured overview of recent advancements in prompt engineering, categorized by application area.
We provide a summary detailing the prompting methodology, its applications, the models involved, and the datasets utilized.
This systematic analysis enables a better understanding of this rapidly developing field and facilitates future research by illuminating open challenges and opportunities for prompt engineering.
arXiv Detail & Related papers (2024-02-05T19:49:13Z) - Foundational Models Defining a New Era in Vision: A Survey and Outlook [151.49434496615427]
Vision systems to see and reason about the compositional nature of visual scenes are fundamental to understanding our world.
The models learned to bridge the gap between such modalities coupled with large-scale training data facilitate contextual reasoning, generalization, and prompt capabilities at test time.
The output of such models can be modified through human-provided prompts without retraining, e.g., segmenting a particular object by providing a bounding box, having interactive dialogues by asking questions about an image or video scene or manipulating the robot's behavior through language instructions.
arXiv Detail & Related papers (2023-07-25T17:59:18Z) - SINC: Self-Supervised In-Context Learning for Vision-Language Tasks [64.44336003123102]
We propose a framework to enable in-context learning in large language models.
A meta-model can learn on self-supervised prompts consisting of tailored demonstrations.
Experiments show that SINC outperforms gradient-based methods in various vision-language tasks.
arXiv Detail & Related papers (2023-07-15T08:33:08Z) - Foundation Models for Natural Language Processing -- Pre-trained
Language Models Integrating Media [0.0]
Foundation Models are pre-trained language models for Natural Language Processing.
They can be applied to a wide range of different media and problem domains, ranging from image and video processing to robot control learning.
This book provides a comprehensive overview of the state of the art in research and applications of Foundation Models.
arXiv Detail & Related papers (2023-02-16T20:42:04Z) - Few-shot Prompting Towards Controllable Response Generation [49.479958672988566]
We first explored the combination of prompting and reinforcement learning (RL) to steer models' generation without accessing any of the models' parameters.
We apply multi-task learning to make the model learn to generalize to new tasks better.
Experiment results show that our proposed method can successfully control several state-of-the-art (SOTA) dialogue models without accessing their parameters.
arXiv Detail & Related papers (2022-06-08T14:48:06Z) - Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods
in Natural Language Processing [78.8500633981247]
This paper surveys and organizes research works in a new paradigm in natural language processing, which we dub "prompt-based learning"
Unlike traditional supervised learning, which trains a model to take in an input x and predict an output y as P(y|x), prompt-based learning is based on language models that model the probability of text directly.
arXiv Detail & Related papers (2021-07-28T18:09:46Z) - Prompt Programming for Large Language Models: Beyond the Few-Shot
Paradigm [0.0]
We discuss methods of prompt programming, emphasizing the usefulness of considering prompts through the lens of natural language.
We introduce the idea of a metaprompt that seeds the model to generate its own natural language prompts for a range of tasks.
arXiv Detail & Related papers (2021-02-15T05:27:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.