Related papers: A Formal Analysis of Multimodal Referring Strategies Under Common Ground

A Formal Analysis of Multimodal Referring Strategies Under Common Ground

URL: http://arxiv.org/abs/2003.07385v1
Date: Mon, 16 Mar 2020 18:08:52 GMT
Title: A Formal Analysis of Multimodal Referring Strategies Under Common Ground
Authors: Nikhil Krishnaswamy and James Pustejovsky
Abstract summary: In doing so, we expose some striking formal semantic properties of the interactions between gesture and language. We show how these formal features can contribute to training better models to predict viewer judgment of referring expressions.
Score: 11.495268947367979
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we present an analysis of computationally generated mixed-modality definite referring expressions using combinations of gesture and linguistic descriptions. In doing so, we expose some striking formal semantic properties of the interactions between gesture and language, conditioned on the introduction of content into the common ground between the (computational) speaker and (human) viewer, and demonstrate how these formal features can contribute to training better models to predict viewer judgment of referring expressions, and potentially to the generation of more natural and informative referring expressions.

Related papers

From Words to Waves: Analyzing Concept Formation in Speech and Text-Based Foundation Models [20.244145418997377]
We analyze the conceptual structures learned by speech and textual models both individually and jointly.<n>We employ Latent Concept Analysis, an unsupervised method for uncovering latent representations in neural networks, to examine how semantic abstractions form across modalities.
arXiv Detail & Related papers (2025-06-01T19:33:21Z)
Learning Co-Speech Gesture Representations in Dialogue through Contrastive Learning: An Intrinsic Evaluation [4.216085185442862]
In face-to-face dialogues, the form-meaning relationship of co-speech gestures varies depending on contextual factors. How can we learn meaningful gestures representations considering gestures' variability and relationship with speech? This paper employs self-supervised contrastive learning techniques to learn gesture representations from skeletal and speech information.
arXiv Detail & Related papers (2024-08-31T08:53:18Z)
A Grammatical Compositional Model for Video Action Detection [24.546886938243393]
We present a novel Grammatical Compositional Model (GCM) for action detection based on typical And-Or graphs. Our model exploits the intrinsic structures and latent relationships of actions in a hierarchical manner to harness both the compositionality of grammar models and the capability of expressing rich features of DNNs.
arXiv Detail & Related papers (2023-10-04T15:24:00Z)
Inverse Dynamics Pretraining Learns Good Representations for Multitask Imitation [66.86987509942607]
We evaluate how such a paradigm should be done in imitation learning. We consider a setting where the pretraining corpus consists of multitask demonstrations. We argue that inverse dynamics modeling is well-suited to this setting.
arXiv Detail & Related papers (2023-05-26T14:40:46Z)
Multimodal Relation Extraction with Cross-Modal Retrieval and Synthesis [89.04041100520881]
This research proposes to retrieve textual and visual evidence based on the object, sentence, and whole image. We develop a novel approach to synthesize the object-level, image-level, and sentence-level information for better reasoning between the same and different modalities.
arXiv Detail & Related papers (2023-05-25T15:26:13Z)
Natural Language Decompositions of Implicit Content Enable Better Text Representations [56.85319224208865]
We introduce a method for the analysis of text that takes implicitly communicated content explicitly into account. We use a large language model to produce sets of propositions that are inferentially related to the text that has been observed. Our results suggest that modeling the meanings behind observed language, rather than the literal text alone, is a valuable direction for NLP.
arXiv Detail & Related papers (2023-05-23T23:45:20Z)
Learnable Visual Words for Interpretable Image Recognition [70.85686267987744]
We propose the Learnable Visual Words (LVW) to interpret the model prediction behaviors with two novel modules. The semantic visual words learning relaxes the category-specific constraint, enabling the general visual words shared across different categories. Our experiments on six visual benchmarks demonstrate the superior effectiveness of our proposed LVW in both accuracy and model interpretation.
arXiv Detail & Related papers (2022-05-22T03:24:45Z)
Improve Discourse Dependency Parsing with Contextualized Representations [28.916249926065273]
We propose to take advantage of transformers to encode contextualized representations of units of different levels. Motivated by the observation of writing patterns commonly shared across articles, we propose a novel method that treats discourse relation identification as a sequence labelling task.
arXiv Detail & Related papers (2022-05-04T14:35:38Z)
Did the Cat Drink the Coffee? Challenging Transformers with Generalized Event Knowledge [59.22170796793179]
Transformers Language Models (TLMs) were tested on a benchmark for the textitdynamic estimation of thematic fit Our results show that TLMs can reach performances that are comparable to those achieved by SDM. However, additional analysis consistently suggests that TLMs do not capture important aspects of event knowledge.
arXiv Detail & Related papers (2021-07-22T20:52:26Z)
Analysing Lexical Semantic Change with Contextualised Word Representations [7.071298726856781]
We propose a novel method that exploits the BERT neural language model to obtain representations of word usages. We create a new evaluation dataset and show that the model representations and the detected semantic shifts are positively correlated with human judgements.
arXiv Detail & Related papers (2020-04-29T12:18:14Z)
Structural Inductive Biases in Emergent Communication [36.26083882473554]
We investigate the impact of representation learning in artificial agents by developing graph referential games. We show that agents parametrized by graph neural networks develop a more compositional language compared to bag-of-words and sequence models.
arXiv Detail & Related papers (2020-02-04T14:59:08Z)
How Far are We from Effective Context Modeling? An Exploratory Study on Semantic Parsing in Context [59.13515950353125]
We present a grammar-based decoding semantic parsing and adapt typical context modeling methods on top of it. We evaluate 13 context modeling methods on two large cross-domain datasets, and our best model achieves state-of-the-art performances.
arXiv Detail & Related papers (2020-02-03T11:28:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.