Vision + Language Applications: A Survey
- URL: http://arxiv.org/abs/2305.14598v1
- Date: Wed, 24 May 2023 00:42:06 GMT
- Title: Vision + Language Applications: A Survey
- Authors: Yutong Zhou and Nobutaka Shimada
- Abstract summary: This paper explores a relevant research track within multimodal applications, including text, vision, audio, and others.
In addition to the studies discussed in this paper, we are also committed to continually updating the latest relevant papers, datasets, application projects and corresponding information.
- Score: 3.8073142980733
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Text-to-image generation has attracted significant interest from researchers
and practitioners in recent years due to its widespread and diverse
applications across various industries. Despite the progress made in the domain
of vision and language research, the existing literature remains relatively
limited, particularly with regard to advancements and applications in this
field. This paper explores a relevant research track within multimodal
applications, including text, vision, audio, and others. In addition to the
studies discussed in this paper, we are also committed to continually updating
the latest relevant papers, datasets, application projects and corresponding
information at https://github.com/Yutong-Zhou-cv/Awesome-Text-to-Image
Related papers
- Towards Visual Grounding: A Survey [87.37662490666098]
Since 2021, visual grounding has witnessed significant advancements, with emerging new concepts such as grounded pre-training.
This survey is designed to be suitable for both beginners and experienced researchers, serving as an invaluable resource for understanding key concepts and tracking the latest research developments.
arXiv Detail & Related papers (2024-12-28T16:34:35Z) - Applications and Advances of Artificial Intelligence in Music Generation:A Review [0.04551615447454769]
This paper provides a systematic review of the latest research advancements in AI music generation.
It covers key technologies, models, datasets, evaluation methods, and their practical applications across various fields.
arXiv Detail & Related papers (2024-09-03T13:50:55Z) - Visual Text Meets Low-level Vision: A Comprehensive Survey on Visual
Text Processing [4.057550183467041]
The field of visual text processing has experienced a surge in research, driven by the advent of fundamental generative models.
We present a comprehensive, multi-perspective analysis of recent advancements in this field.
arXiv Detail & Related papers (2024-02-05T15:13:20Z) - Literature Review: Computer Vision Applications in Transportation
Logistics and Warehousing [58.720142291102135]
Computer vision applications in transportation logistics and warehousing have a huge potential for process automation.
We present a structured literature review on research in the field to help leverage this potential.
arXiv Detail & Related papers (2023-04-12T17:33:41Z) - The Semantic Reader Project: Augmenting Scholarly Documents through
AI-Powered Interactive Reading Interfaces [54.2590226904332]
We describe the Semantic Reader Project, a effort across multiple institutions to explore automatic creation of dynamic reading interfaces for research papers.
Ten prototype interfaces have been developed and more than 300 participants and real-world users have shown improved reading experiences.
We structure this paper around challenges scholars and the public face when reading research papers.
arXiv Detail & Related papers (2023-03-25T02:47:09Z) - 3D Object Detection from Images for Autonomous Driving: A Survey [68.33502122185813]
3D object detection from images is one of the fundamental and challenging problems in autonomous driving.
More than 200 works have studied this problem from 2015 to 2021, encompassing a broad spectrum of theories, algorithms, and applications.
We provide the first comprehensive survey of this novel and continuously growing research field, summarizing the most commonly used pipelines for image-based 3D detection.
arXiv Detail & Related papers (2022-02-07T07:12:24Z) - From Show to Tell: A Survey on Image Captioning [48.98681267347662]
Connecting Vision and Language plays an essential role in Generative Intelligence.
Research in image captioning has not reached a conclusive answer yet.
This work aims at providing a comprehensive overview and categorization of image captioning approaches.
arXiv Detail & Related papers (2021-07-14T18:00:54Z) - A Survey of Deep Learning Approaches for OCR and Document Understanding [68.65995739708525]
We review different techniques for document understanding for documents written in English.
We consolidate methodologies present in literature to act as a jumping-off point for researchers exploring this area.
arXiv Detail & Related papers (2020-11-27T03:05:59Z) - Multimodal Research in Vision and Language: A Review of Current and
Emerging Trends [41.07256031348454]
We present a detailed overview of the latest trends in research pertaining to visual and language modalities.
We look at its applications in their task formulations and how to solve various problems related to semantic perception and content generation.
We shed some light on multi-disciplinary patterns and insights that have emerged in the recent past, directing this field towards more modular and transparent intelligent systems.
arXiv Detail & Related papers (2020-10-19T13:55:10Z) - Positioning yourself in the maze of Neural Text Generation: A
Task-Agnostic Survey [54.34370423151014]
This paper surveys the components of modeling approaches relaying task impacts across various generation tasks such as storytelling, summarization, translation etc.
We present an abstraction of the imperative techniques with respect to learning paradigms, pretraining, modeling approaches, decoding and the key challenges outstanding in the field in each of them.
arXiv Detail & Related papers (2020-10-14T17:54:42Z) - Text Recognition in the Wild: A Survey [33.22076515689926]
This literature review attempts to present the entire picture of the field of scene text recognition.
It provides a comprehensive reference for people entering this field, and could be helpful to inspire future research.
arXiv Detail & Related papers (2020-05-07T13:57:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.