Referring Expression Generation in Visually Grounded Dialogue with Discourse-aware Comprehension Guiding
- URL: http://arxiv.org/abs/2409.05721v1
- Date: Mon, 9 Sep 2024 15:33:07 GMT
- Title: Referring Expression Generation in Visually Grounded Dialogue with Discourse-aware Comprehension Guiding
- Authors: Bram Willemsen, Gabriel Skantze,
- Abstract summary: We propose an approach to referring expression generation (REG) that is meant to produce referring expressions (REs) that are both discriminative and discourse-appropriate.
Results from our human evaluation indicate that our proposed two-stage approach is effective in producing discriminative REs.
- Score: 3.8673630752805446
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose an approach to referring expression generation (REG) in visually grounded dialogue that is meant to produce referring expressions (REs) that are both discriminative and discourse-appropriate. Our method constitutes a two-stage process. First, we model REG as a text- and image-conditioned next-token prediction task. REs are autoregressively generated based on their preceding linguistic context and a visual representation of the referent. Second, we propose the use of discourse-aware comprehension guiding as part of a generate-and-rerank strategy through which candidate REs generated with our REG model are reranked based on their discourse-dependent discriminatory power. Results from our human evaluation indicate that our proposed two-stage approach is effective in producing discriminative REs, with higher performance in terms of text-image retrieval accuracy for reranked REs compared to those generated using greedy decoding.
Related papers
- Intrinsic Task-based Evaluation for Referring Expression Generation [9.322715583523928]
Referring Expressions (REs) generated by state-of-the-art neural models were not only indistinguishable from the REs in textscwebnlg but also from the REs generated by a simple rule-based system.
Here, we argue that this limitation could stem from the use of a purely ratings-based human evaluation.
We propose an intrinsic task-based evaluation for REG models, in which, in addition to rating the quality of REs, participants were asked to accomplish two meta-level tasks.
arXiv Detail & Related papers (2024-02-12T06:21:35Z) - Whether you can locate or not? Interactive Referring Expression
Generation [12.148963878497243]
We propose an Interactive REG (IREG) model that can interact with a real REC model.
IREG outperforms previous state-of-the-art methods on popular evaluation metrics.
arXiv Detail & Related papers (2023-08-19T10:53:32Z) - Towards Improved Room Impulse Response Estimation for Speech Recognition [53.04440557465013]
We propose a novel approach for blind room impulse response (RIR) estimation systems in the context of far-field automatic speech recognition (ASR)
We first draw the connection between improved RIR estimation and improved ASR performance, as a means of evaluating neural RIR estimators.
We then propose a generative adversarial network (GAN) based architecture that encodes RIR features from reverberant speech and constructs an RIR from the encoded features.
arXiv Detail & Related papers (2022-11-08T00:40:27Z) - Unsupervised Syntactically Controlled Paraphrase Generation with
Abstract Meaning Representations [59.10748929158525]
Abstract Representations (AMR) can greatly improve the performance of unsupervised syntactically controlled paraphrase generation.
Our proposed model, AMR-enhanced Paraphrase Generator (AMRPG), encodes the AMR graph and the constituency parses the input sentence into two disentangled semantic and syntactic embeddings.
Experiments show that AMRPG generates more accurate syntactically controlled paraphrases, both quantitatively and qualitatively, compared to the existing unsupervised approaches.
arXiv Detail & Related papers (2022-11-02T04:58:38Z) - Dialogue Meaning Representation for Task-Oriented Dialogue Systems [51.91615150842267]
We propose Dialogue Meaning Representation (DMR), a flexible and easily extendable representation for task-oriented dialogue.
Our representation contains a set of nodes and edges with inheritance hierarchy to represent rich semantics for compositional semantics and task-specific concepts.
We propose two evaluation tasks to evaluate different machine learning based dialogue models, and further propose a novel coreference resolution model GNNCoref for the graph-based coreference resolution task.
arXiv Detail & Related papers (2022-04-23T04:17:55Z) - Local Explanation of Dialogue Response Generation [77.68077106724522]
Local explanation of response generation (LERG) is proposed to gain insights into the reasoning process of a generation model.
LERG views the sequence prediction as uncertainty estimation of a human response and then creates explanations by perturbing the input and calculating the certainty change over the human response.
Our results show that our method consistently improves other widely used methods on proposed automatic- and human- evaluation metrics for this new task by 4.4-12.8%.
arXiv Detail & Related papers (2021-06-11T17:58:36Z) - Semantic Representation for Dialogue Modeling [22.80679759491184]
We exploit Abstract Meaning Representation (AMR) to help dialogue modeling.
Compared with the textual input, AMR explicitly provides core semantic knowledge.
We are the first to leverage a formal semantic representation into neural dialogue modeling.
arXiv Detail & Related papers (2021-05-21T07:55:07Z) - PPGN: Phrase-Guided Proposal Generation Network For Referring Expression
Comprehension [31.39505099600821]
We propose a novel phrase-guided proposal generation network ( PPGN)
The main implementation principle of PPGN is refining visual features with text and generate proposals through regression.
Experiments show that our method is effective and achieve SOTA performance in benchmark datasets.
arXiv Detail & Related papers (2020-12-20T11:21:06Z) - Controlling Dialogue Generation with Semantic Exemplars [55.460082747572734]
We present an Exemplar-based Dialogue Generation model, EDGE, that uses the semantic frames present in exemplar responses to guide generation.
We show that controlling dialogue generation based on the semantic frames of exemplars, rather than words in the exemplar itself, improves the coherence of generated responses.
arXiv Detail & Related papers (2020-08-20T17:02:37Z) - Dialogue-Based Relation Extraction [53.2896545819799]
We present the first human-annotated dialogue-based relation extraction (RE) dataset DialogRE.
We argue that speaker-related information plays a critical role in the proposed task, based on an analysis of similarities and differences between dialogue-based and traditional RE tasks.
Experimental results demonstrate that a speaker-aware extension on the best-performing model leads to gains in both the standard and conversational evaluation settings.
arXiv Detail & Related papers (2020-04-17T03:51:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.