Related papers: Multimodal Research in Vision and Language: A Review of Current and Emerging Trends

Multimodal Research in Vision and Language: A Review of Current and Emerging Trends

URL: http://arxiv.org/abs/2010.09522v2
Date: Tue, 22 Dec 2020 04:43:20 GMT
Title: Multimodal Research in Vision and Language: A Review of Current and Emerging Trends
Authors: Shagun Uppal, Sarthak Bhagat, Devamanyu Hazarika, Navonil Majumdar, Soujanya Poria, Roger Zimmermann, and Amir Zadeh
Abstract summary: We present a detailed overview of the latest trends in research pertaining to visual and language modalities. We look at its applications in their task formulations and how to solve various problems related to semantic perception and content generation. We shed some light on multi-disciplinary patterns and insights that have emerged in the recent past, directing this field towards more modular and transparent intelligent systems.
Score: 41.07256031348454
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Deep Learning and its applications have cascaded impactful research and development with a diverse range of modalities present in the real-world data. More recently, this has enhanced research interests in the intersection of the Vision and Language arena with its numerous applications and fast-paced growth. In this paper, we present a detailed overview of the latest trends in research pertaining to visual and language modalities. We look at its applications in their task formulations and how to solve various problems related to semantic perception and content generation. We also address task-specific trends, along with their evaluation strategies and upcoming challenges. Moreover, we shed some light on multi-disciplinary patterns and insights that have emerged in the recent past, directing this field towards more modular and transparent intelligent systems. This survey identifies key trends gravitating recent literature in VisLang research and attempts to unearth directions that the field is heading towards.

Related papers

Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey [124.23247710880008]
multimodal CoT (MCoT) reasoning has recently garnered significant research attention. Existing MCoT studies design various methodologies to address the challenges of image, video, speech, audio, 3D, and structured data. We present the first systematic survey of MCoT reasoning, elucidating the relevant foundational concepts and definitions.
arXiv Detail & Related papers (2025-03-16T18:39:13Z)
Self-assessment, Exhibition, and Recognition: a Review of Personality in Large Language Models [29.086329448754412]
We present a comprehensive review by categorizing current studies into three research problems: self-assessment, exhibition, and recognition. Our paper is the first comprehensive survey of up-to-date literature on personality in large language models.
arXiv Detail & Related papers (2024-06-25T15:08:44Z)
Large Language Models for Education: A Survey and Outlook [69.02214694865229]
We systematically review the technological advancements in each perspective, organize related datasets and benchmarks, and identify the risks and challenges associated with deploying LLMs in education. Our survey aims to provide a comprehensive technological picture for educators, researchers, and policymakers to harness the power of LLMs to revolutionize educational practices and foster a more effective personalized learning environment.
arXiv Detail & Related papers (2024-03-26T21:04:29Z)
A Survey of Neural Code Intelligence: Paradigms, Advances and Beyond [84.95530356322621]
This survey presents a systematic review of the advancements in code intelligence. It covers over 50 representative models and their variants, more than 20 categories of tasks, and an extensive coverage of over 680 related works. Building on our examination of the developmental trajectories, we further investigate the emerging synergies between code intelligence and broader machine intelligence.
arXiv Detail & Related papers (2024-03-21T08:54:56Z)
Large Language Models(LLMs) on Tabular Data: Prediction, Generation, and Understanding -- A Survey [17.19337964440007]
There is currently a lack of comprehensive review that summarizes and compares the key techniques, metrics, datasets, models, and optimization approaches in this research domain. This survey aims to address this gap by consolidating recent progress in these areas, offering a thorough survey and taxonomy of the datasets, metrics, and methodologies utilized. It identifies strengths, limitations, unexplored territories, and gaps in the existing literature, while providing some insights for future research directions in this vital and rapidly evolving field.
arXiv Detail & Related papers (2024-02-27T23:59:01Z)
Recent Advances in Hate Speech Moderation: Multimodality and the Role of Large Models [52.24001776263608]
This comprehensive survey delves into the recent strides in HS moderation. We highlight the burgeoning role of large language models (LLMs) and large multimodal models (LMMs) We identify existing gaps in research, particularly in the context of underrepresented languages and cultures.
arXiv Detail & Related papers (2024-01-30T03:51:44Z)
Trends in Integration of Knowledge and Large Language Models: A Survey and Taxonomy of Methods, Benchmarks, and Applications [41.24492058141363]
Large language models (LLMs) exhibit superior performance on various natural language tasks, but they are susceptible to issues stemming from outdated data and domain-specific limitations. We propose a review to discuss the trends in integration of knowledge and large language models, including taxonomy of methods, benchmarks, and applications.
arXiv Detail & Related papers (2023-11-10T05:24:04Z)
A Comprehensive Survey of Forgetting in Deep Learning Beyond Continual Learning [58.107474025048866]
Forgetting refers to the loss or deterioration of previously acquired knowledge. Forgetting is a prevalent phenomenon observed in various other research domains within deep learning.
arXiv Detail & Related papers (2023-07-16T16:27:58Z)
Parsing Objects at a Finer Granularity: A Survey [54.72819146263311]
Fine-grained visual parsing is important in many real-world applications, e.g., agriculture, remote sensing, and space technologies. Predominant research efforts tackle these fine-grained sub-tasks following different paradigms. We conduct an in-depth study of the advanced work from a new perspective of learning the part relationship.
arXiv Detail & Related papers (2022-12-28T04:20:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.