A Linguistic Analysis of Visually Grounded Dialogues Based on Spatial
Expressions
- URL: http://arxiv.org/abs/2010.03127v1
- Date: Wed, 7 Oct 2020 02:50:38 GMT
- Title: A Linguistic Analysis of Visually Grounded Dialogues Based on Spatial
Expressions
- Authors: Takuma Udagawa, Takato Yamazaki, Akiko Aizawa
- Abstract summary: We propose a framework for investigating fine-grained language understanding in visually grounded dialogues.
We focus on OneCommon Corpus citepudagawa 2019natural,udagawa 2020annotated, a simple yet challenging common grounding dataset.
We analyze their linguistic structures based on textitspatial expressions and provide comprehensive and reliable annotation for 600 dialogues.
- Score: 35.24301299033675
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent models achieve promising results in visually grounded dialogues.
However, existing datasets often contain undesirable biases and lack
sophisticated linguistic analyses, which make it difficult to understand how
well current models recognize their precise linguistic structures. To address
this problem, we make two design choices: first, we focus on OneCommon Corpus
\citep{udagawa2019natural,udagawa2020annotated}, a simple yet challenging
common grounding dataset which contains minimal bias by design. Second, we
analyze their linguistic structures based on \textit{spatial expressions} and
provide comprehensive and reliable annotation for 600 dialogues. We show that
our annotation captures important linguistic structures including
predicate-argument structure, modification and ellipsis. In our experiments, we
assess the model's understanding of these structures through reference
resolution. We demonstrate that our annotation can reveal both the strengths
and weaknesses of baseline models in essential levels of detail. Overall, we
propose a novel framework and resource for investigating fine-grained language
understanding in visually grounded dialogues.
Related papers
- How Well Do Text Embedding Models Understand Syntax? [50.440590035493074]
The ability of text embedding models to generalize across a wide range of syntactic contexts remains under-explored.
Our findings reveal that existing text embedding models have not sufficiently addressed these syntactic understanding challenges.
We propose strategies to augment the generalization ability of text embedding models in diverse syntactic scenarios.
arXiv Detail & Related papers (2023-11-14T08:51:00Z) - "You Are An Expert Linguistic Annotator": Limits of LLMs as Analyzers of
Abstract Meaning Representation [60.863629647985526]
We examine the successes and limitations of the GPT-3, ChatGPT, and GPT-4 models in analysis of sentence meaning structure.
We find that models can reliably reproduce the basic format of AMR, and can often capture core event, argument, and modifier structure.
Overall, our findings indicate that these models out-of-the-box can capture aspects of semantic structure, but there remain key limitations in their ability to support fully accurate semantic analyses or parses.
arXiv Detail & Related papers (2023-10-26T21:47:59Z) - Is Argument Structure of Learner Chinese Understandable: A Corpus-Based
Analysis [8.883799596036484]
This paper presents a corpus-based analysis of argument structure errors in learner Chinese.
The data for analysis includes sentences produced by language learners as well as their corrections by native speakers.
We couple the data with semantic role labeling annotations that are manually created by two senior students.
arXiv Detail & Related papers (2023-08-17T21:10:04Z) - Foundational Models Defining a New Era in Vision: A Survey and Outlook [151.49434496615427]
Vision systems to see and reason about the compositional nature of visual scenes are fundamental to understanding our world.
The models learned to bridge the gap between such modalities coupled with large-scale training data facilitate contextual reasoning, generalization, and prompt capabilities at test time.
The output of such models can be modified through human-provided prompts without retraining, e.g., segmenting a particular object by providing a bounding box, having interactive dialogues by asking questions about an image or video scene or manipulating the robot's behavior through language instructions.
arXiv Detail & Related papers (2023-07-25T17:59:18Z) - Disco-Bench: A Discourse-Aware Evaluation Benchmark for Language
Modelling [70.23876429382969]
We propose a benchmark that can evaluate intra-sentence discourse properties across a diverse set of NLP tasks.
Disco-Bench consists of 9 document-level testsets in the literature domain, which contain rich discourse phenomena.
For linguistic analysis, we also design a diagnostic test suite that can examine whether the target models learn discourse knowledge.
arXiv Detail & Related papers (2023-07-16T15:18:25Z) - A Knowledge-Enhanced Adversarial Model for Cross-lingual Structured
Sentiment Analysis [31.05169054736711]
Cross-lingual structured sentiment analysis task aims to transfer the knowledge from source language to target one.
We propose a Knowledge-Enhanced Adversarial Model (textttKEAM) with both implicit distributed and explicit structural knowledge.
We conduct experiments on five datasets and compare textttKEAM with both the supervised and unsupervised methods.
arXiv Detail & Related papers (2022-05-31T03:07:51Z) - Robustness Testing of Language Understanding in Dialog Systems [33.30143655553583]
We conduct comprehensive evaluation and analysis with respect to the robustness of natural language understanding models.
We introduce three important aspects related to language understanding in real-world dialog systems, namely, language variety, speech characteristics, and noise perturbation.
We propose a model-agnostic toolkit LAUG to approximate natural perturbation for testing the robustness issues in dialog systems.
arXiv Detail & Related papers (2020-12-30T18:18:47Z) - Structured Attention for Unsupervised Dialogue Structure Induction [110.12561786644122]
We propose to incorporate structured attention layers into a Variational Recurrent Neural Network (VRNN) model with discrete latent states to learn dialogue structure in an unsupervised fashion.
Compared to a vanilla VRNN, structured attention enables a model to focus on different parts of the source sentence embeddings while enforcing a structural inductive bias.
arXiv Detail & Related papers (2020-09-17T23:07:03Z) - Probing Contextual Language Models for Common Ground with Visual
Representations [76.05769268286038]
We design a probing model that evaluates how effective are text-only representations in distinguishing between matching and non-matching visual representations.
Our findings show that language representations alone provide a strong signal for retrieving image patches from the correct object categories.
Visually grounded language models slightly outperform text-only language models in instance retrieval, but greatly under-perform humans.
arXiv Detail & Related papers (2020-05-01T21:28:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.