Are Current Decoding Strategies Capable of Facing the Challenges of
Visual Dialogue?
- URL: http://arxiv.org/abs/2210.12997v1
- Date: Mon, 24 Oct 2022 07:34:39 GMT
- Title: Are Current Decoding Strategies Capable of Facing the Challenges of
Visual Dialogue?
- Authors: Amit Kumar Chaudhary, Alex J. Lucassen, Ioanna Tsani, Alberto Testoni
- Abstract summary: We compare different decoding strategies in a Visual Dialogue referential game.
None of them successfully balance lexical richness, accuracy in the task, and visual grounding.
We believe our findings and suggestions may serve as a starting point for designing more effective decoding algorithms.
- Score: 3.491999371287298
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Decoding strategies play a crucial role in natural language generation
systems. They are usually designed and evaluated in open-ended text-only tasks,
and it is not clear how different strategies handle the numerous challenges
that goal-oriented multimodal systems face (such as grounding and
informativeness). To answer this question, we compare a wide variety of
different decoding strategies and hyper-parameter configurations in a Visual
Dialogue referential game. Although none of them successfully balance lexical
richness, accuracy in the task, and visual grounding, our in-depth analysis
allows us to highlight the strengths and weaknesses of each decoding strategy.
We believe our findings and suggestions may serve as a starting point for
designing more effective decoding algorithms that handle the challenges of
Visual Dialogue tasks.
Related papers
- Visual AI and Linguistic Intelligence Through Steerability and
Composability [0.0]
This study explores the capabilities of multimodal large language models (LLMs) in handling challenging multistep tasks that integrate language and vision.
The research presents a series of 14 creatively and constructively diverse tasks, ranging from AI Lego Designing to AI Satellite Image Analysis.
arXiv Detail & Related papers (2023-11-18T22:01:33Z) - Multi-level Contrastive Learning for Script-based Character
Understanding [14.341307979533871]
We tackle the scenario of understanding characters in scripts, which aims to learn the characters' personalities and identities from their utterances.
We propose a multi-level contrastive learning framework to capture characters' global information in a fine-grained manner.
arXiv Detail & Related papers (2023-10-20T02:40:52Z) - Uncovering Hidden Connections: Iterative Search and Reasoning for Video-grounded Dialog [83.63849872250651]
Video-grounded dialog requires profound understanding of both dialog history and video content for accurate response generation.
We present an iterative search and reasoning framework, which consists of a textual encoder, a visual encoder, and a generator.
arXiv Detail & Related papers (2023-10-11T07:37:13Z) - Language Model Decoding as Likelihood-Utility Alignment [54.70547032876017]
We introduce a taxonomy that groups decoding strategies based on their implicit assumptions about how well the model's likelihood is aligned with the task-specific notion of utility.
Specifically, by analyzing the correlation between the likelihood and the utility of predictions across a diverse set of tasks, we provide the first empirical evidence supporting the proposed taxonomy.
arXiv Detail & Related papers (2022-10-13T17:55:51Z) - Improving Multi-turn Emotional Support Dialogue Generation with
Lookahead Strategy Planning [81.79431311952656]
We propose a novel system MultiESC to provide Emotional Support.
For strategy planning, we propose lookaheads to estimate the future user feedback after using particular strategies.
For user state modeling, MultiESC focuses on capturing users' subtle emotional expressions and understanding their emotion causes.
arXiv Detail & Related papers (2022-10-09T12:23:47Z) - Deep Learning for Visual Speech Analysis: A Survey [54.53032361204449]
This paper presents a review of recent progress in deep learning methods on visual speech analysis.
We cover different aspects of visual speech, including fundamental problems, challenges, benchmark datasets, a taxonomy of existing methods, and state-of-the-art performance.
arXiv Detail & Related papers (2022-05-22T14:44:53Z) - On Decoding Strategies for Neural Text Generators [73.48162198041884]
We study the interaction between language generation tasks and decoding strategies.
We measure changes in attributes of generated text as a function of both decoding strategy and task.
Our results reveal both previously-observed and surprising findings.
arXiv Detail & Related papers (2022-03-29T16:25:30Z) - Looking for Confirmations: An Effective and Human-Like Visual Dialogue
Strategy [6.02280861819024]
State-Of-The-Art systems are shown to generate questions that, although grammatically correct, often lack an effective strategy and sound unnatural to humans.
We design Confirm-it, a model based on a beam search re-ranking algorithm that guides an effective goal-oriented strategy.
We show that dialogues generated by Confirm-it are more natural and effective than beam search decoding without re-ranking.
arXiv Detail & Related papers (2021-09-11T16:28:58Z) - From Show to Tell: A Survey on Image Captioning [48.98681267347662]
Connecting Vision and Language plays an essential role in Generative Intelligence.
Research in image captioning has not reached a conclusive answer yet.
This work aims at providing a comprehensive overview and categorization of image captioning approaches.
arXiv Detail & Related papers (2021-07-14T18:00:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.