Emergent Communication in Interactive Sketch Question Answering
- URL: http://arxiv.org/abs/2310.15597v1
- Date: Tue, 24 Oct 2023 08:00:20 GMT
- Title: Emergent Communication in Interactive Sketch Question Answering
- Authors: Zixing Lei, Yiming Zhang, Yuxin Xiong and Siheng Chen
- Abstract summary: Vision-based emergent communication (EC) aims to learn to communicate through sketches and demystify the evolution of human communication.
We first introduce a novel Interactive Sketch Question Answering (ISQA) task, where two collaborative players are interacting through sketches to answer a question about an image in a multi-round manner.
Our experimental results including human evaluation demonstrate that multi-round interactive mechanism facilitates targeted and efficient communication between intelligent agents with decent human interpretability.
- Score: 38.38087954142305
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Vision-based emergent communication (EC) aims to learn to communicate through
sketches and demystify the evolution of human communication. Ironically,
previous works neglect multi-round interaction, which is indispensable in human
communication. To fill this gap, we first introduce a novel Interactive Sketch
Question Answering (ISQA) task, where two collaborative players are interacting
through sketches to answer a question about an image in a multi-round manner.
To accomplish this task, we design a new and efficient interactive EC system,
which can achieve an effective balance among three evaluation factors,
including the question answering accuracy, drawing complexity and human
interpretability. Our experimental results including human evaluation
demonstrate that multi-round interactive mechanism facilitates targeted and
efficient communication between intelligent agents with decent human
interpretability.
Related papers
- Visual-Geometric Collaborative Guidance for Affordance Learning [63.038406948791454]
We propose a visual-geometric collaborative guided affordance learning network that incorporates visual and geometric cues.
Our method outperforms the representative models regarding objective metrics and visual quality.
arXiv Detail & Related papers (2024-10-15T07:35:51Z) - AntEval: Evaluation of Social Interaction Competencies in LLM-Driven
Agents [65.16893197330589]
Large Language Models (LLMs) have demonstrated their ability to replicate human behaviors across a wide range of scenarios.
However, their capability in handling complex, multi-character social interactions has yet to be fully explored.
We introduce the Multi-Agent Interaction Evaluation Framework (AntEval), encompassing a novel interaction framework and evaluation methods.
arXiv Detail & Related papers (2024-01-12T11:18:00Z) - Disentangled Interaction Representation for One-Stage Human-Object
Interaction Detection [70.96299509159981]
Human-Object Interaction (HOI) detection is a core task for human-centric image understanding.
Recent one-stage methods adopt a transformer decoder to collect image-wide cues that are useful for interaction prediction.
Traditional two-stage methods benefit significantly from their ability to compose interaction features in a disentangled and explainable manner.
arXiv Detail & Related papers (2023-12-04T08:02:59Z) - Re-mine, Learn and Reason: Exploring the Cross-modal Semantic
Correlations for Language-guided HOI detection [57.13665112065285]
Human-Object Interaction (HOI) detection is a challenging computer vision task.
We present a framework that enhances HOI detection by incorporating structured text knowledge.
arXiv Detail & Related papers (2023-07-25T14:20:52Z) - Automatic Context-Driven Inference of Engagement in HMI: A Survey [6.479224589451863]
This paper presents a survey on engagement inference for human-machine interaction.
It entails interdisciplinary definition, engagement components and factors, publicly available datasets, ground truth assessment, and most commonly used features and methods.
It serves as a guide for the development of future human-machine interaction interfaces with reliable context-aware engagement inference capability.
arXiv Detail & Related papers (2022-09-30T10:46:13Z) - Enabling Harmonious Human-Machine Interaction with Visual-Context
Augmented Dialogue System: A Review [40.49926141538684]
Visual Context Augmented Dialogue System (VAD) has the potential to communicate with humans by perceiving and understanding multimodal information.
VAD possesses the potential to generate engaging and context-aware responses.
arXiv Detail & Related papers (2022-07-02T09:31:37Z) - Transferable Interactiveness Knowledge for Human-Object Interaction
Detection [46.89715038756862]
We explore interactiveness knowledge which indicates whether a human and an object interact with each other or not.
We found that interactiveness knowledge can be learned across HOI datasets and bridge the gap between diverse HOI category settings.
Our core idea is to exploit an interactiveness network to learn the general interactiveness knowledge from multiple HOI datasets.
arXiv Detail & Related papers (2021-01-25T18:21:07Z) - You Impress Me: Dialogue Generation via Mutual Persona Perception [62.89449096369027]
The research in cognitive science suggests that understanding is an essential signal for a high-quality chit-chat conversation.
Motivated by this, we propose P2 Bot, a transmitter-receiver based framework with the aim of explicitly modeling understanding.
arXiv Detail & Related papers (2020-04-11T12:51:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.