Large Language Models Meet Computer Vision: A Brief Survey
- URL: http://arxiv.org/abs/2311.16673v1
- Date: Tue, 28 Nov 2023 10:39:19 GMT
- Title: Large Language Models Meet Computer Vision: A Brief Survey
- Authors: Raby Hamadi
- Abstract summary: Large Language Models (LLMs) and Computer Vision (CV) have emerged as a pivotal area of research, driving significant advancements in the field of Artificial Intelligence (AI)
This survey paper delves into the latest progressions in the domain of transformers, emphasizing their potential to revolutionize Vision Transformers (ViTs) and LLMs.
The survey is concluded by highlighting open directions in the field, suggesting potential venues for future research and development.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Recently, the intersection of Large Language Models (LLMs) and Computer
Vision (CV) has emerged as a pivotal area of research, driving significant
advancements in the field of Artificial Intelligence (AI). As transformers have
become the backbone of many state-of-the-art models in both Natural Language
Processing (NLP) and CV, understanding their evolution and potential
enhancements is crucial. This survey paper delves into the latest progressions
in the domain of transformers and their subsequent successors, emphasizing
their potential to revolutionize Vision Transformers (ViTs) and LLMs. This
survey also presents a comparative analysis, juxtaposing the performance
metrics of several leading paid and open-source LLMs, shedding light on their
strengths and areas of improvement as well as a literature review on how LLMs
are being used to tackle vision related tasks. Furthermore, the survey presents
a comprehensive collection of datasets employed to train LLMs, offering
insights into the diverse data available to achieve high performance in various
pre-training and downstream tasks of LLMs. The survey is concluded by
highlighting open directions in the field, suggesting potential venues for
future research and development. This survey aims to underscores the profound
intersection of LLMs on CV, leading to a new era of integrated and advanced AI
models.
Related papers
- From Linguistic Giants to Sensory Maestros: A Survey on Cross-Modal Reasoning with Large Language Models [56.9134620424985]
Cross-modal reasoning (CMR) is increasingly recognized as a crucial capability in the progression toward more sophisticated artificial intelligence systems.
The recent trend of deploying Large Language Models (LLMs) to tackle CMR tasks has marked a new mainstream of approaches for enhancing their effectiveness.
This survey offers a nuanced exposition of current methodologies applied in CMR using LLMs, classifying these into a detailed three-tiered taxonomy.
arXiv Detail & Related papers (2024-09-19T02:51:54Z) - Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs [56.391404083287235]
We introduce Cambrian-1, a family of multimodal LLMs (MLLMs) designed with a vision-centric approach.
Our study uses LLMs and visual instruction tuning as an interface to evaluate various visual representations.
We provide model weights, code, supporting tools, datasets, and detailed instruction-tuning and evaluation recipes.
arXiv Detail & Related papers (2024-06-24T17:59:42Z) - LLMs Meet Multimodal Generation and Editing: A Survey [89.76691959033323]
This survey elaborates on multimodal generation and editing across various domains, comprising image, video, 3D, and audio.
We summarize the notable advancements with milestone works in these fields and categorize these studies into LLM-based and CLIP/T5-based methods.
We dig into tool-augmented multimodal agents that can leverage existing generative models for human-computer interaction.
arXiv Detail & Related papers (2024-05-29T17:59:20Z) - A Survey on Self-Evolution of Large Language Models [116.54238664264928]
Large language models (LLMs) have significantly advanced in various fields and intelligent agent applications.
To address this issue, self-evolution approaches that enable LLMs to autonomously acquire, refine, and learn from experiences generated by the model itself are rapidly growing.
arXiv Detail & Related papers (2024-04-22T17:43:23Z) - ChatGPT Alternative Solutions: Large Language Models Survey [0.0]
Large Language Models (LLMs) have ignited a surge in research contributions within this domain.
Recent years have witnessed a dynamic synergy between academia and industry, propelling the field of LLM research to new heights.
This survey furnishes a well-rounded perspective on the current state of generative AI, shedding light on opportunities for further exploration, enhancement, and innovation.
arXiv Detail & Related papers (2024-03-21T15:16:50Z) - Data Augmentation using Large Language Models: Data Perspectives, Learning Paradigms and Challenges [47.45993726498343]
Data augmentation (DA) has emerged as a pivotal technique for enhancing model performance by diversifying training examples without the need for additional data collection.
This survey explores the transformative impact of large language models (LLMs) on DA, particularly addressing the unique challenges and opportunities they present in the context of natural language processing (NLP) and beyond.
arXiv Detail & Related papers (2024-03-05T14:11:54Z) - Exploring the Frontier of Vision-Language Models: A Survey of Current Methodologies and Future Directions [11.786387517781328]
Vision-Language Models (VLMs) are advanced models that can tackle more intricate tasks such as image captioning and visual question answering.
Our classification organizes VLMs into three distinct categories: models dedicated to vision-language understanding, models that process multimodal inputs to generate unimodal (textual) outputs and models that both accept and produce multimodal inputs and outputs.
We meticulously dissect each model, offering an extensive analysis of its foundational architecture, training data sources, as well as its strengths and limitations wherever possible.
arXiv Detail & Related papers (2024-02-20T18:57:34Z) - Rethinking Interpretability in the Era of Large Language Models [76.1947554386879]
Large language models (LLMs) have demonstrated remarkable capabilities across a wide array of tasks.
The capability to explain in natural language allows LLMs to expand the scale and complexity of patterns that can be given to a human.
These new capabilities raise new challenges, such as hallucinated explanations and immense computational costs.
arXiv Detail & Related papers (2024-01-30T17:38:54Z) - Advancing Transformer Architecture in Long-Context Large Language
Models: A Comprehensive Survey [18.930417261395906]
Transformer-based Large Language Models (LLMs) have been applied in diverse areas such as knowledge bases, human interfaces, and dynamic agents.
This article offers a survey of the recent advancement in Transformer-based LLM architectures aimed at enhancing the long-context capabilities of LLMs.
arXiv Detail & Related papers (2023-11-21T04:59:17Z) - Advances in Embodied Navigation Using Large Language Models: A Survey [16.8165925743264]
The article offers an exhaustive summary of the symbiosis between Large Language Models and Embodied Intelligence.
It reviews state-of-the-art models, research methodologies, and assesses the advantages and disadvantages of existing embodied navigation models and datasets.
Finally, the article elucidates the role of LLMs in embodied intelligence, based on current research, and forecasts future directions in the field.
arXiv Detail & Related papers (2023-11-01T14:08:56Z) - A Comprehensive Overview of Large Language Models [68.22178313875618]
Large Language Models (LLMs) have recently demonstrated remarkable capabilities in natural language processing tasks.
This article provides an overview of the existing literature on a broad range of LLM-related concepts.
arXiv Detail & Related papers (2023-07-12T20:01:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.