Grounded and Well-rounded: A Methodological Approach to the Study of
Cross-modal and Cross-lingual Grounding
- URL: http://arxiv.org/abs/2310.11938v1
- Date: Wed, 18 Oct 2023 13:05:50 GMT
- Title: Grounded and Well-rounded: A Methodological Approach to the Study of
Cross-modal and Cross-lingual Grounding
- Authors: Timothee Mickus and Elaine Zosa and Denis Paperno
- Abstract summary: Grounding has been argued to be a crucial component towards the development of more complete and truly semantically competent artificial intelligence systems.
We study what the effects are - if any - of providing models with richer input sources than text-only.
Experiments using this framework reveal qualitative differences in model behavior between cross-modally grounded, cross-lingually grounded, and ungrounded models.
- Score: 3.435010087800495
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Grounding has been argued to be a crucial component towards the development
of more complete and truly semantically competent artificial intelligence
systems. Literature has divided into two camps: While some argue that grounding
allows for qualitatively different generalizations, others believe it can be
compensated by mono-modal data quantity. Limited empirical evidence has emerged
for or against either position, which we argue is due to the methodological
challenges that come with studying grounding and its effects on NLP systems.
In this paper, we establish a methodological framework for studying what the
effects are - if any - of providing models with richer input sources than
text-only. The crux of it lies in the construction of comparable samples of
populations of models trained on different input modalities, so that we can
tease apart the qualitative effects of different input sources from
quantifiable model performances. Experiments using this framework reveal
qualitative differences in model behavior between cross-modally grounded,
cross-lingually grounded, and ungrounded models, which we measure both at a
global dataset level as well as for specific word representations, depending on
how concrete their semantics is.
Related papers
- A Study on Bias Detection and Classification in Natural Language Processing [2.908482270923597]
The aim of our work is to determine how to better combine publicly-available datasets to train models in the task of hate speech detection and classification.
We discuss these issues in tandem with the development of our experiments, in which we show that the combinations of different datasets greatly impact the models' performance.
arXiv Detail & Related papers (2024-08-14T11:49:24Z) - Analyzing Persuasive Strategies in Meme Texts: A Fusion of Language Models with Paraphrase Enrichment [0.23020018305241333]
This paper describes our approach to hierarchical multi-label detection of persuasion techniques in meme texts.
The scope of the study encompasses enhancing model performance through innovative training techniques and data augmentation strategies.
arXiv Detail & Related papers (2024-07-01T20:25:20Z) - Enhancing Robustness of Foundation Model Representations under
Provenance-related Distribution Shifts [8.298173603769063]
We examine the stability of models based on foundation models under distribution shift.
We focus on confounding by provenance, a form of distribution shift that emerges in the context of multi-institutional datasets.
Results indicate that while foundation models do show some out-of-the-box robustness to confounding-by-provenance related distribution shifts, this can be improved through adjustment.
arXiv Detail & Related papers (2023-12-09T02:02:45Z) - Foundational Models Defining a New Era in Vision: A Survey and Outlook [151.49434496615427]
Vision systems to see and reason about the compositional nature of visual scenes are fundamental to understanding our world.
The models learned to bridge the gap between such modalities coupled with large-scale training data facilitate contextual reasoning, generalization, and prompt capabilities at test time.
The output of such models can be modified through human-provided prompts without retraining, e.g., segmenting a particular object by providing a bounding box, having interactive dialogues by asking questions about an image or video scene or manipulating the robot's behavior through language instructions.
arXiv Detail & Related papers (2023-07-25T17:59:18Z) - Are Neural Topic Models Broken? [81.15470302729638]
We study the relationship between automated and human evaluation of topic models.
We find that neural topic models fare worse in both respects compared to an established classical method.
arXiv Detail & Related papers (2022-10-28T14:38:50Z) - Testing Pre-trained Language Models' Understanding of Distributivity via
Causal Mediation Analysis [13.07356367140208]
We introduce DistNLI, a new diagnostic dataset for natural language inference.
We find that the extent of models' understanding is associated with model size and vocabulary size.
arXiv Detail & Related papers (2022-09-11T00:33:28Z) - Bridging the Data Gap between Training and Inference for Unsupervised
Neural Machine Translation [49.916963624249355]
A UNMT model is trained on the pseudo parallel data with translated source, and natural source sentences in inference.
The source discrepancy between training and inference hinders the translation performance of UNMT models.
We propose an online self-training approach, which simultaneously uses the pseudo parallel data natural source, translated target to mimic the inference scenario.
arXiv Detail & Related papers (2022-03-16T04:50:27Z) - On Modality Bias Recognition and Reduction [70.69194431713825]
We study the modality bias problem in the context of multi-modal classification.
We propose a plug-and-play loss function method, whereby the feature space for each label is adaptively learned.
Our method yields remarkable performance improvements compared with the baselines.
arXiv Detail & Related papers (2022-02-25T13:47:09Z) - An Empirical Study on Measuring the Similarity of Sentential Arguments
with Language Model Domain Adaptation [0.0]
A dataset must be annotated using expertise in a variety of topics, making supervised learning with labeled data expensive.
We first adapted a pretrained language model to a domain of interest using self-supervised learning.
We fine-tuned the model to a task of measuring the similarity between sentences taken from different domains.
arXiv Detail & Related papers (2021-02-19T08:05:46Z) - Interpretable Multi-dataset Evaluation for Named Entity Recognition [110.64368106131062]
We present a general methodology for interpretable evaluation for the named entity recognition (NER) task.
The proposed evaluation method enables us to interpret the differences in models and datasets, as well as the interplay between them.
By making our analysis tool available, we make it easy for future researchers to run similar analyses and drive progress in this area.
arXiv Detail & Related papers (2020-11-13T10:53:27Z) - How Far are We from Effective Context Modeling? An Exploratory Study on
Semantic Parsing in Context [59.13515950353125]
We present a grammar-based decoding semantic parsing and adapt typical context modeling methods on top of it.
We evaluate 13 context modeling methods on two large cross-domain datasets, and our best model achieves state-of-the-art performances.
arXiv Detail & Related papers (2020-02-03T11:28:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.