Review of Large Vision Models and Visual Prompt Engineering
- URL: http://arxiv.org/abs/2307.00855v1
- Date: Mon, 3 Jul 2023 08:48:49 GMT
- Title: Review of Large Vision Models and Visual Prompt Engineering
- Authors: Jiaqi Wang, Zhengliang Liu, Lin Zhao, Zihao Wu, Chong Ma, Sigang Yu,
Haixing Dai, Qiushi Yang, Yiheng Liu, Songyao Zhang, Enze Shi, Yi Pan, Tuo
Zhang, Dajiang Zhu, Xiang Li, Xi Jiang, Bao Ge, Yixuan Yuan, Dinggang Shen,
Tianming Liu, Shu Zhang
- Abstract summary: Review aims to summarize the methods employed in the computer vision domain for large vision models and visual prompt engineering.
We present influential large models in the visual domain and a range of prompt engineering methods employed on these models.
- Score: 50.63394642549947
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual prompt engineering is a fundamental technology in the field of visual
and image Artificial General Intelligence, serving as a key component for
achieving zero-shot capabilities. As the development of large vision models
progresses, the importance of prompt engineering becomes increasingly evident.
Designing suitable prompts for specific visual tasks has emerged as a
meaningful research direction. This review aims to summarize the methods
employed in the computer vision domain for large vision models and visual
prompt engineering, exploring the latest advancements in visual prompt
engineering. We present influential large models in the visual domain and a
range of prompt engineering methods employed on these models. It is our hope
that this review provides a comprehensive and systematic description of prompt
engineering methods based on large visual models, offering valuable insights
for future researchers in their exploration of this field.
Related papers
- Visual Knowledge in the Big Model Era: Retrospect and Prospect [63.282425615863]
Visual knowledge is a new form of knowledge representation that can encapsulate visual concepts and their relations in a succinct, comprehensive, and interpretable manner.
As the knowledge about the visual world has been identified as an indispensable component of human cognition and intelligence, visual knowledge is poised to have a pivotal role in establishing machine intelligence.
arXiv Detail & Related papers (2024-04-05T07:31:24Z) - A Systematic Survey of Prompt Engineering in Large Language Models:
Techniques and Applications [11.568575664316143]
This paper provides a structured overview of recent advancements in prompt engineering, categorized by application area.
We provide a summary detailing the prompting methodology, its applications, the models involved, and the datasets utilized.
This systematic analysis enables a better understanding of this rapidly developing field and facilitates future research by illuminating open challenges and opportunities for prompt engineering.
arXiv Detail & Related papers (2024-02-05T19:49:13Z) - State of the Art on Diffusion Models for Visual Computing [191.6168813012954]
This report introduces the basic mathematical concepts of diffusion models, implementation details and design choices of the popular Stable Diffusion model.
We also give a comprehensive overview of the rapidly growing literature on diffusion-based generation and editing.
We discuss available datasets, metrics, open challenges, and social implications.
arXiv Detail & Related papers (2023-10-11T05:32:29Z) - A Systematic Survey of Prompt Engineering on Vision-Language Foundation
Models [43.35892536887404]
Prompt engineering involves augmenting a large pre-trained model with task-specific hints, known as prompts, to adapt the model to new tasks.
This paper aims to provide a comprehensive survey of cutting-edge research in prompt engineering on three types of vision-language models.
arXiv Detail & Related papers (2023-07-24T17:58:06Z) - GPT4Image: Can Large Pre-trained Models Help Vision Models on Perception
Tasks? [51.22096780511165]
We present a new learning paradigm in which the knowledge extracted from large pre-trained models are utilized to help models like CNN and ViT learn enhanced representations.
We feed detailed descriptions into a pre-trained encoder to extract text embeddings with rich semantic information that encodes the content of images.
arXiv Detail & Related papers (2023-06-01T14:02:45Z) - Deep Learning to See: Towards New Foundations of Computer Vision [88.69805848302266]
This book criticizes the supposed scientific progress in the field of computer vision.
It proposes the investigation of vision within the framework of information-based laws of nature.
arXiv Detail & Related papers (2022-06-30T15:20:36Z) - Searching the Search Space of Vision Transformer [98.96601221383209]
Vision Transformer has shown great visual representation power in substantial vision tasks such as recognition and detection.
We propose to use neural architecture search to automate this process, by searching not only the architecture but also the search space.
We provide design guidelines of general vision transformers with extensive analysis according to the space searching process.
arXiv Detail & Related papers (2021-11-29T17:26:07Z) - Visual Sensation and Perception Computational Models for Deep Learning:
State of the art, Challenges and Prospects [7.949330621850412]
visual sensation and perception refers to the process of sensing, organizing, identifying, and interpreting visual information in environmental awareness and understanding.
Computational models inspired by visual perception have the characteristics of complexity and diversity, as they come from many subjects such as cognition science, information science, and artificial intelligence.
arXiv Detail & Related papers (2021-09-08T01:51:24Z) - Attention mechanisms and deep learning for machine vision: A survey of
the state of the art [0.0]
Vision transformers (ViTs) are giving quite a challenge to the established deep learning based machine vision techniques.
Some recent works suggest that combinations of these two varied fields can prove to build systems which have the advantages of both these fields.
arXiv Detail & Related papers (2021-06-03T10:23:32Z) - Deep learning for scene recognition from visual data: a survey [2.580765958706854]
This work aims to be a review of the state-of-the-art in scene recognition with deep learning models from visual data.
Scene recognition is still an emerging field in computer vision, which has been addressed from a single image and dynamic image perspective.
arXiv Detail & Related papers (2020-07-03T16:53:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.