Enabling Harmonious Human-Machine Interaction with Visual-Context
Augmented Dialogue System: A Review
- URL: http://arxiv.org/abs/2207.00782v1
- Date: Sat, 2 Jul 2022 09:31:37 GMT
- Title: Enabling Harmonious Human-Machine Interaction with Visual-Context
Augmented Dialogue System: A Review
- Authors: Hao Wang, Bin Guo, Yating Zeng, Yasan Ding, Chen Qiu, Ying Zhang, Lina
Yao, Zhiwen Yu
- Abstract summary: Visual Context Augmented Dialogue System (VAD) has the potential to communicate with humans by perceiving and understanding multimodal information.
VAD possesses the potential to generate engaging and context-aware responses.
- Score: 40.49926141538684
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The intelligent dialogue system, aiming at communicating with humans
harmoniously with natural language, is brilliant for promoting the advancement
of human-machine interaction in the era of artificial intelligence. With the
gradually complex human-computer interaction requirements (e.g., multimodal
inputs, time sensitivity), it is difficult for traditional text-based dialogue
system to meet the demands for more vivid and convenient interaction.
Consequently, Visual Context Augmented Dialogue System (VAD), which has the
potential to communicate with humans by perceiving and understanding multimodal
information (i.e., visual context in images or videos, textual dialogue
history), has become a predominant research paradigm. Benefiting from the
consistency and complementarity between visual and textual context, VAD
possesses the potential to generate engaging and context-aware responses. For
depicting the development of VAD, we first characterize the concepts and unique
features of VAD, and then present its generic system architecture to illustrate
the system workflow. Subsequently, several research challenges and
representative works are detailed investigated, followed by the summary of
authoritative benchmarks. We conclude this paper by putting forward some open
issues and promising research trends for VAD, e.g., the cognitive mechanisms of
human-machine dialogue under cross-modal dialogue context, and
knowledge-enhanced cross-modal semantic interaction.
Related papers
- Human-Robot Dialogue Annotation for Multi-Modal Common Ground [4.665414514091581]
We describe the development of symbolic representations annotated on human-robot dialogue data to make dimensions of meaning accessible to autonomous systems participating in collaborative, natural language dialogue, and to enable common ground with human partners.
A particular challenge for establishing common ground arises in remote dialogue, where a human and robot are engaged in a joint navigation and exploration task of an unfamiliar environment, but where the robot cannot immediately share high quality visual information due to limited communication constraints.
Within this paradigm, we capture propositional semantics and the illocutionary force of a single utterance within the dialogue through our Dialogue-AMR annotation, an augmentation of Abstract Meaning Representation
arXiv Detail & Related papers (2024-11-19T19:33:54Z) - WavChat: A Survey of Spoken Dialogue Models [66.82775211793547]
Recent advancements in spoken dialogue models, exemplified by systems like GPT-4o, have captured significant attention in the speech domain.
These advanced spoken dialogue models not only comprehend audio, music, and other speech-related features, but also capture stylistic and timbral characteristics in speech.
Despite the progress in spoken dialogue systems, there is a lack of comprehensive surveys that systematically organize and analyze these systems.
arXiv Detail & Related papers (2024-11-15T04:16:45Z) - I Was Blind but Now I See: Implementing Vision-Enabled Dialogue in
Social Robots [0.040792653193642496]
This paper presents an initial implementation of a dialogue manager that enhances the traditional text-based prompts with real-time visual input.
The system's prompt engineering, incorporating dialogue with summarisation of the images, ensures a balance between context preservation and computational efficiency.
arXiv Detail & Related papers (2023-11-15T13:47:00Z) - Interactive Natural Language Processing [67.87925315773924]
Interactive Natural Language Processing (iNLP) has emerged as a novel paradigm within the field of NLP.
This paper offers a comprehensive survey of iNLP, starting by proposing a unified definition and framework of the concept.
arXiv Detail & Related papers (2023-05-22T17:18:29Z) - A Review of Dialogue Systems: From Trained Monkeys to Stochastic Parrots [0.0]
We aim to deploy artificial intelligence to build automated dialogue agents that can converse with humans.
We present a broad overview of methods developed to build dialogue systems over the years.
arXiv Detail & Related papers (2021-11-02T08:07:55Z) - Advances in Multi-turn Dialogue Comprehension: A Survey [51.215629336320305]
Training machines to understand natural language and interact with humans is an elusive and essential task of artificial intelligence.
This paper reviews the previous methods from the technical perspective of dialogue modeling for the dialogue comprehension task.
In addition, we categorize dialogue-related pre-training techniques which are employed to enhance PrLMs in dialogue scenarios.
arXiv Detail & Related papers (2021-10-11T03:52:37Z) - Advances in Multi-turn Dialogue Comprehension: A Survey [51.215629336320305]
We review the previous methods from the perspective of dialogue modeling.
We discuss three typical patterns of dialogue modeling that are widely-used in dialogue comprehension tasks.
arXiv Detail & Related papers (2021-03-04T15:50:17Z) - Exploring Recurrent, Memory and Attention Based Architectures for
Scoring Interactional Aspects of Human-Machine Text Dialog [9.209192502526285]
This paper builds on previous work in this direction to investigate multiple neural architectures.
We conduct experiments on a conversational database of text dialogs from human learners interacting with a cloud-based dialog system.
We find that fusion of multiple architectures performs competently on our automated scoring task relative to expert inter-rater agreements.
arXiv Detail & Related papers (2020-05-20T03:23:00Z) - You Impress Me: Dialogue Generation via Mutual Persona Perception [62.89449096369027]
The research in cognitive science suggests that understanding is an essential signal for a high-quality chit-chat conversation.
Motivated by this, we propose P2 Bot, a transmitter-receiver based framework with the aim of explicitly modeling understanding.
arXiv Detail & Related papers (2020-04-11T12:51:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.