Presentation and Analysis of a Multimodal Dataset for Grounded Language
Learning
- URL: http://arxiv.org/abs/2007.14987v4
- Date: Mon, 28 Sep 2020 16:47:50 GMT
- Title: Presentation and Analysis of a Multimodal Dataset for Grounded Language
Learning
- Authors: Patrick Jenkins, Rishabh Sachdeva, Gaoussou Youssouf Kebe, Padraig
Higgins, Kasra Darvish, Edward Raff, Don Engel, John Winder, Francis Ferraro,
Cynthia Matuszek
- Abstract summary: Grounded language acquisition involves learning how language-based interactions refer to the world around them.
In practice the data used for learning tends to be cleaner, clearer, and more grammatical than actual human interactions.
We present a dataset of common household objects described by people using either spoken or written language.
- Score: 32.28310581819443
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Grounded language acquisition -- learning how language-based interactions
refer to the world around them -- is amajor area of research in robotics, NLP,
and HCI. In practice the data used for learning consists almost entirely of
textual descriptions, which tend to be cleaner, clearer, and more grammatical
than actual human interactions. In this work, we present the Grounded Language
Dataset (GoLD), a multimodal dataset of common household objects described by
people using either spoken or written language. We analyze the differences and
present an experiment showing how the different modalities affect language
learning from human in-put. This will enable researchers studying the
intersection of robotics, NLP, and HCI to better investigate how the multiple
modalities of image, text, and speech interact, as well as show differences in
the vernacular of these modalities impact results.
Related papers
- Divergences between Language Models and Human Brains [63.405788999891335]
Recent research has hinted that brain signals can be effectively predicted using internal representations of language models (LMs)
We show that there are clear differences in how LMs and humans represent and use language.
We identify two domains that are not captured well by LMs: social/emotional intelligence and physical commonsense.
arXiv Detail & Related papers (2023-11-15T19:02:40Z) - Visual Grounding Helps Learn Word Meanings in Low-Data Regimes [47.7950860342515]
Modern neural language models (LMs) are powerful tools for modeling human sentence production and comprehension.
But to achieve these results, LMs must be trained in distinctly un-human-like ways.
Do models trained more naturalistically -- with grounded supervision -- exhibit more humanlike language learning?
We investigate this question in the context of word learning, a key sub-task in language acquisition.
arXiv Detail & Related papers (2023-10-20T03:33:36Z) - Multimodal Modeling For Spoken Language Identification [57.94119986116947]
Spoken language identification refers to the task of automatically predicting the spoken language in a given utterance.
We propose MuSeLI, a Multimodal Spoken Language Identification method, which delves into the use of various metadata sources to enhance language identification.
arXiv Detail & Related papers (2023-09-19T12:21:39Z) - Human Inspired Progressive Alignment and Comparative Learning for
Grounded Word Acquisition [6.47452771256903]
We take inspiration from how human babies acquire their first language, and developed a computational process for word acquisition through comparative learning.
Motivated by cognitive findings, we generated a small dataset that enables the computation models to compare the similarities and differences of various attributes.
We frame the acquisition of words as not only the information filtration process, but also as representation-symbol mapping.
arXiv Detail & Related papers (2023-07-05T19:38:04Z) - A Linguistic Investigation of Machine Learning based Contradiction
Detection Models: An Empirical Analysis and Future Perspectives [0.34998703934432673]
We analyze two Natural Language Inference data sets with respect to their linguistic features.
The goal is to identify those syntactic and semantic properties that are particularly hard to comprehend for a machine learning model.
arXiv Detail & Related papers (2022-10-19T10:06:03Z) - Unravelling Interlanguage Facts via Explainable Machine Learning [10.71581852108984]
We focus on the internals of an NLI classifier trained by an emphexplainable machine learning algorithm.
We use this perspective in order to tackle both NLI and a companion task, guessing whether a text has been written by a native or a non-native speaker.
We investigate which kind of linguistic traits are most effective for solving our two tasks, namely, are most indicative of a speaker's L1.
arXiv Detail & Related papers (2022-08-02T14:05:15Z) - Neural Variational Learning for Grounded Language Acquisition [14.567067583556714]
We propose a learning system in which language is grounded in visual percepts without specific pre-defined categories of terms.
We show that this generative approach exhibits promising results in language grounding without pre-specifying visual categories under low resource settings.
arXiv Detail & Related papers (2021-07-20T20:55:02Z) - Bridging Linguistic Typology and Multilingual Machine Translation with
Multi-View Language Representations [83.27475281544868]
We use singular vector canonical correlation analysis to study what kind of information is induced from each source.
We observe that our representations embed typology and strengthen correlations with language relationships.
We then take advantage of our multi-view language vector space for multilingual machine translation, where we achieve competitive overall translation accuracy.
arXiv Detail & Related papers (2020-04-30T16:25:39Z) - Experience Grounds Language [185.73483760454454]
Language understanding research is held back by a failure to relate language to the physical world it describes and to the social interactions it facilitates.
Despite the incredible effectiveness of language processing models to tackle tasks after being trained on text alone, successful linguistic communication relies on a shared experience of the world.
arXiv Detail & Related papers (2020-04-21T16:56:27Z) - Information-Theoretic Probing for Linguistic Structure [74.04862204427944]
We propose an information-theoretic operationalization of probing as estimating mutual information.
We evaluate on a set of ten typologically diverse languages often underrepresented in NLP research.
arXiv Detail & Related papers (2020-04-07T01:06:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.