Neural Variational Learning for Grounded Language Acquisition
- URL: http://arxiv.org/abs/2107.14593v1
- Date: Tue, 20 Jul 2021 20:55:02 GMT
- Title: Neural Variational Learning for Grounded Language Acquisition
- Authors: Nisha Pillai, Cynthia Matuszek, Francis Ferraro
- Abstract summary: We propose a learning system in which language is grounded in visual percepts without specific pre-defined categories of terms.
We show that this generative approach exhibits promising results in language grounding without pre-specifying visual categories under low resource settings.
- Score: 14.567067583556714
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a learning system in which language is grounded in visual percepts
without specific pre-defined categories of terms. We present a unified
generative method to acquire a shared semantic/visual embedding that enables
the learning of language about a wide range of real-world objects. We evaluate
the efficacy of this learning by predicting the semantics of objects and
comparing the performance with neural and non-neural inputs. We show that this
generative approach exhibits promising results in language grounding without
pre-specifying visual categories under low resource settings. Our experiments
demonstrate that this approach is generalizable to multilingual, highly varied
datasets.
Related papers
- Enhancing Context Through Contrast [0.4068270792140993]
We propose a novel Context Enhancement step to improve performance on neural machine translation.
Unlike other approaches, we do not explicitly augment the data but view languages as implicit augmentations.
Our method does not learn embeddings from scratch and can be generalised to any set of pre-trained embeddings.
arXiv Detail & Related papers (2024-01-06T22:13:51Z) - SINC: Self-Supervised In-Context Learning for Vision-Language Tasks [64.44336003123102]
We propose a framework to enable in-context learning in large language models.
A meta-model can learn on self-supervised prompts consisting of tailored demonstrations.
Experiments show that SINC outperforms gradient-based methods in various vision-language tasks.
arXiv Detail & Related papers (2023-07-15T08:33:08Z) - Human Inspired Progressive Alignment and Comparative Learning for
Grounded Word Acquisition [6.47452771256903]
We take inspiration from how human babies acquire their first language, and developed a computational process for word acquisition through comparative learning.
Motivated by cognitive findings, we generated a small dataset that enables the computation models to compare the similarities and differences of various attributes.
We frame the acquisition of words as not only the information filtration process, but also as representation-symbol mapping.
arXiv Detail & Related papers (2023-07-05T19:38:04Z) - Improving Policy Learning via Language Dynamics Distillation [87.27583619910338]
We propose Language Dynamics Distillation (LDD), which pretrains a model to predict environment dynamics given demonstrations with language descriptions.
We show that language descriptions in demonstrations improve sample-efficiency and generalization across environments.
arXiv Detail & Related papers (2022-09-30T19:56:04Z) - Bridging the Gap: Using Deep Acoustic Representations to Learn Grounded
Language from Percepts and Raw Speech [26.076534338576234]
Learning to understand grounded language, which connects natural language to percepts, is a critical research area.
In this work we demonstrate the feasibility of performing grounded language acquisition on paired visual percepts and raw speech inputs.
arXiv Detail & Related papers (2021-12-27T16:12:30Z) - Towards Zero-shot Language Modeling [90.80124496312274]
We construct a neural model that is inductively biased towards learning human languages.
We infer this distribution from a sample of typologically diverse training languages.
We harness additional language-specific side information as distant supervision for held-out languages.
arXiv Detail & Related papers (2021-08-06T23:49:18Z) - Understanding Synonymous Referring Expressions via Contrastive Features [105.36814858748285]
We develop an end-to-end trainable framework to learn contrastive features on the image and object instance levels.
We conduct extensive experiments to evaluate the proposed algorithm on several benchmark datasets.
arXiv Detail & Related papers (2021-04-20T17:56:24Z) - Language in a (Search) Box: Grounding Language Learning in Real-World
Human-Machine Interaction [4.137464623395377]
We show how a grounding domain, a denotation function and a composition function are learned from user data only.
We benchmark our grounded semantics on compositionality and zero-shot inference tasks.
arXiv Detail & Related papers (2021-04-18T15:03:16Z) - A Visuospatial Dataset for Naturalistic Verb Learning [18.654373173232205]
We introduce a new dataset for training and evaluating grounded language models.
Our data is collected within a virtual reality environment and is designed to emulate the quality of language data to which a pre-verbal child is likely to have access.
We use the collected data to compare several distributional semantics models for verb learning.
arXiv Detail & Related papers (2020-10-28T20:47:13Z) - Vokenization: Improving Language Understanding with Contextualized,
Visual-Grounded Supervision [110.66085917826648]
We develop a technique that extrapolates multimodal alignments to language-only data by contextually mapping language tokens to their related images.
"vokenization" is trained on relatively small image captioning datasets and we then apply it to generate vokens for large language corpora.
Trained with these contextually generated vokens, our visually-supervised language models show consistent improvements over self-supervised alternatives on multiple pure-language tasks.
arXiv Detail & Related papers (2020-10-14T02:11:51Z) - Visual Grounding in Video for Unsupervised Word Translation [91.47607488740647]
We use visual grounding to improve unsupervised word mapping between languages.
We learn embeddings from unpaired instructional videos narrated in the native language.
We apply these methods to translate words from English to French, Korean, and Japanese.
arXiv Detail & Related papers (2020-03-11T02:03:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.