VD-PCR: Improving Visual Dialog with Pronoun Coreference Resolution
- URL: http://arxiv.org/abs/2205.14693v1
- Date: Sun, 29 May 2022 15:29:50 GMT
- Title: VD-PCR: Improving Visual Dialog with Pronoun Coreference Resolution
- Authors: Xintong Yu, Hongming Zhang, Ruixin Hong, Yangqiu Song, Changshui Zhang
- Abstract summary: The visual dialog task requires an AI agent to interact with humans in multi-round dialogs based on a visual environment.
We propose VD-PCR, a novel framework to improve Visual Dialog understanding with Pronoun Coreference Resolution.
With the proposed implicit and explicit methods, VD-PCR achieves state-of-the-art experimental results on the VisDial dataset.
- Score: 79.05412803762528
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The visual dialog task requires an AI agent to interact with humans in
multi-round dialogs based on a visual environment. As a common linguistic
phenomenon, pronouns are often used in dialogs to improve the communication
efficiency. As a result, resolving pronouns (i.e., grounding pronouns to the
noun phrases they refer to) is an essential step towards understanding dialogs.
In this paper, we propose VD-PCR, a novel framework to improve Visual Dialog
understanding with Pronoun Coreference Resolution in both implicit and explicit
ways. First, to implicitly help models understand pronouns, we design novel
methods to perform the joint training of the pronoun coreference resolution and
visual dialog tasks. Second, after observing that the coreference relationship
of pronouns and their referents indicates the relevance between dialog rounds,
we propose to explicitly prune the irrelevant history rounds in visual dialog
models' input. With pruned input, the models can focus on relevant dialog
history and ignore the distraction in the irrelevant one. With the proposed
implicit and explicit methods, VD-PCR achieves state-of-the-art experimental
results on the VisDial dataset.
Related papers
- Multi-turn Dialogue Comprehension from a Topic-aware Perspective [70.37126956655985]
This paper proposes to model multi-turn dialogues from a topic-aware perspective.
We use a dialogue segmentation algorithm to split a dialogue passage into topic-concentrated fragments in an unsupervised way.
We also present a novel model, Topic-Aware Dual-Attention Matching (TADAM) Network, which takes topic segments as processing elements.
arXiv Detail & Related papers (2023-09-18T11:03:55Z) - Channel-aware Decoupling Network for Multi-turn Dialogue Comprehension [81.47133615169203]
We propose compositional learning for holistic interaction across utterances beyond the sequential contextualization from PrLMs.
We employ domain-adaptive training strategies to help the model adapt to the dialogue domains.
Experimental results show that our method substantially boosts the strong PrLM baselines in four public benchmark datasets.
arXiv Detail & Related papers (2023-01-10T13:18:25Z) - Improving Cross-Modal Understanding in Visual Dialog via Contrastive
Learning [24.673262969986993]
We analyze the cross-modal understanding in visual dialog based on the vision-language pre-training model VD-BERT.
We propose a novel approach to improve the cross-modal understanding for visual dialog, named ICMU.
arXiv Detail & Related papers (2022-04-15T02:36:52Z) - Modeling Coreference Relations in Visual Dialog [18.926582410644375]
The occurrences of coreference relations in the dialog makes it a more challenging task than visual question-answering.
We propose two soft constraints that can improve the model's ability of resolving coreferences in dialog in an unsupervised way.
arXiv Detail & Related papers (2022-03-06T15:22:24Z) - Exophoric Pronoun Resolution in Dialogues with Topic Regularization [84.23706744602217]
Resolving pronouns to their referents has long been studied as a fundamental natural language understanding problem.
Previous works on pronoun coreference resolution (PCR) mostly focus on resolving pronouns to mentions in text while ignoring the exophoric scenario.
We propose to jointly leverage the local context and global topics of dialogues to solve the out-of-textPCR problem.
arXiv Detail & Related papers (2021-09-10T11:08:31Z) - Graph Based Network with Contextualized Representations of Turns in
Dialogue [0.0]
Dialogue-based relation extraction (RE) aims to extract relation(s) between two arguments that appear in a dialogue.
We propose the TUrn COntext awaRE Graph Convolutional Network (TUCORE-GCN) modeled by paying attention to the way people understand dialogues.
arXiv Detail & Related papers (2021-09-09T03:09:08Z) - Learning Reasoning Paths over Semantic Graphs for Video-grounded
Dialogues [73.04906599884868]
We propose a novel framework of Reasoning Paths in Dialogue Context (PDC)
PDC model discovers information flows among dialogue turns through a semantic graph constructed based on lexical components in each question and answer.
Our model sequentially processes both visual and textual information through this reasoning path and the propagated features are used to generate the answer.
arXiv Detail & Related papers (2021-03-01T07:39:26Z) - VD-BERT: A Unified Vision and Dialog Transformer with BERT [161.0016161052714]
We propose VD-BERT, a simple yet effective framework of unified vision-dialog Transformer.
We adapt BERT for the effective fusion of vision and dialog contents via visually grounded training.
Our model yields new state of the art, achieving the top position in both single-model and ensemble settings.
arXiv Detail & Related papers (2020-04-28T04:08:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.