What is Learned in Visually Grounded Neural Syntax Acquisition
- URL: http://arxiv.org/abs/2005.01678v2
- Date: Mon, 18 May 2020 19:19:57 GMT
- Title: What is Learned in Visually Grounded Neural Syntax Acquisition
- Authors: Noriyuki Kojima, Hadar Averbuch-Elor, Alexander M. Rush, Yoav Artzi
- Abstract summary: We consider the case study of the Visually Grounded Neural Syntax Learner.
By constructing simplified versions of the model, we isolate the core factors that yield the model's strong performance.
We find that a simple lexical signal of noun concreteness plays the main role in the model's predictions.
- Score: 118.6461386981381
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual features are a promising signal for learning bootstrap textual models.
However, blackbox learning models make it difficult to isolate the specific
contribution of visual components. In this analysis, we consider the case study
of the Visually Grounded Neural Syntax Learner (Shi et al., 2019), a recent
approach for learning syntax from a visual training signal. By constructing
simplified versions of the model, we isolate the core factors that yield the
model's strong performance. Contrary to what the model might be capable of
learning, we find significantly less expressive versions produce similar
predictions and perform just as well, or even better. We also find that a
simple lexical signal of noun concreteness plays the main role in the model's
predictions as opposed to more complex syntactic reasoning.
Related papers
- On the Foundations of Shortcut Learning [20.53986437152018]
We study how predictivity and availability interact to shape models' feature use.
We find that linear models are relatively unbiased, but introducing a single hidden layer with ReLU or Tanh units yields a bias.
arXiv Detail & Related papers (2023-10-24T22:54:05Z) - Specify Robust Causal Representation from Mixed Observations [35.387451486213344]
Learning representations purely from observations concerns the problem of learning a low-dimensional, compact representation which is beneficial to prediction models.
We develop a learning method to learn such representation from observational data by regularizing the learning procedure with mutual information measures.
We theoretically and empirically show that the models trained with the learned causal representations are more robust under adversarial attacks and distribution shifts.
arXiv Detail & Related papers (2023-10-21T02:18:35Z) - Visual Grounding Helps Learn Word Meanings in Low-Data Regimes [47.7950860342515]
Modern neural language models (LMs) are powerful tools for modeling human sentence production and comprehension.
But to achieve these results, LMs must be trained in distinctly un-human-like ways.
Do models trained more naturalistically -- with grounded supervision -- exhibit more humanlike language learning?
We investigate this question in the context of word learning, a key sub-task in language acquisition.
arXiv Detail & Related papers (2023-10-20T03:33:36Z) - Seeing in Words: Learning to Classify through Language Bottlenecks [59.97827889540685]
Humans can explain their predictions using succinct and intuitive descriptions.
We show that a vision model whose feature representations are text can effectively classify ImageNet images.
arXiv Detail & Related papers (2023-06-29T00:24:42Z) - Localization vs. Semantics: Visual Representations in Unimodal and
Multimodal Models [57.08925810659545]
We conduct a comparative analysis of the visual representations in existing vision-and-language models and vision-only models.
Our empirical observations suggest that vision-and-language models are better at label prediction tasks.
We hope our study sheds light on the role of language in visual learning, and serves as an empirical guide for various pretrained models.
arXiv Detail & Related papers (2022-12-01T05:00:18Z) - Knowledge Transfer For On-Device Speech Emotion Recognition with Neural
Structured Learning [19.220263739291685]
Speech emotion recognition (SER) has been a popular research topic in human-computer interaction (HCI)
We propose a neural structured learning (NSL) framework through building synthesized graphs.
Our experiments demonstrate that training a lightweight SER model on the target dataset with speech samples and graphs can not only produce small SER models, but also enhance the model performance.
arXiv Detail & Related papers (2022-10-26T18:38:42Z) - Dynamic Latent Separation for Deep Learning [67.62190501599176]
A core problem in machine learning is to learn expressive latent variables for model prediction on complex data.
Here, we develop an approach that improves expressiveness, provides partial interpretation, and is not restricted to specific applications.
arXiv Detail & Related papers (2022-10-07T17:56:53Z) - Prototypical Representation Learning for Relation Extraction [56.501332067073065]
This paper aims to learn predictive, interpretable, and robust relation representations from distantly-labeled data.
We learn prototypes for each relation from contextual information to best explore the intrinsic semantics of relations.
Results on several relation learning tasks show that our model significantly outperforms the previous state-of-the-art relational models.
arXiv Detail & Related papers (2021-03-22T08:11:43Z) - Visually Grounded Compound PCFGs [65.04669567781634]
Exploiting visual groundings for language understanding has recently been drawing much attention.
We study visually grounded grammar induction and learn a constituency from both unlabeled text and its visual captions.
arXiv Detail & Related papers (2020-09-25T19:07:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.