Related papers: Language Acquisition is Embodied, Interactive, Emotive: a Research Proposal

Language Acquisition is Embodied, Interactive, Emotive: a Research Proposal

URL: http://arxiv.org/abs/2105.04633v1
Date: Mon, 10 May 2021 19:40:17 GMT
Title: Language Acquisition is Embodied, Interactive, Emotive: a Research Proposal
Authors: Casey Kennington
Abstract summary: We review the literature on the role of embodiment and emotion in the interactive setting of spoken dialogue as necessary prerequisites for language learning for human children. We sketch a model of semantics that leverages current transformer-based models and a word-level grounded model, then explain the robot-dialogue system that will make use of our semantic model.
Score: 2.639737913330821
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Humans' experience of the world is profoundly multimodal from the beginning, so why do existing state-of-the-art language models only use text as a modality to learn and represent semantic meaning? In this paper we review the literature on the role of embodiment and emotion in the interactive setting of spoken dialogue as necessary prerequisites for language learning for human children, including how words in child vocabularies are largely concrete, then shift to become more abstract as the children get older. We sketch a model of semantics that leverages current transformer-based models and a word-level grounded model, then explain the robot-dialogue system that will make use of our semantic model, the setting for the system to learn language, and existing benchmarks for evaluation.

Related papers

Could the Road to Grounded, Neuro-symbolic AI be Paved with Words-as-Classifiers? [16.12935949253909]
We make the case that one potential path forward in unifying all three semantic fields is paved with the words-as-classifier model.<n>We review that literature, motivate the words-as-classifiers model with an appeal to recent work in cognitive science, and describe a small experiment.
arXiv Detail & Related papers (2025-07-08T18:44:34Z)
Visually Grounded Language Learning: a review of language games, datasets, tasks, and models [60.2604624857992]
Many Vision+Language (V+L) tasks have been defined with the aim of creating models that can ground symbols in the visual modality. In this work, we provide a systematic literature review of several tasks and models proposed in the V+L field.
arXiv Detail & Related papers (2023-12-05T02:17:29Z)
Visual Grounding Helps Learn Word Meanings in Low-Data Regimes [47.7950860342515]
Modern neural language models (LMs) are powerful tools for modeling human sentence production and comprehension. But to achieve these results, LMs must be trained in distinctly un-human-like ways. Do models trained more naturalistically -- with grounded supervision -- exhibit more humanlike language learning? We investigate this question in the context of word learning, a key sub-task in language acquisition.
arXiv Detail & Related papers (2023-10-20T03:33:36Z)
Learning to Model the World with Language [100.76069091703505]
To interact with humans and act in the world, agents need to understand the range of language that people use and relate it to the visual world. Our key idea is that agents should interpret such diverse language as a signal that helps them predict the future. We instantiate this in Dynalang, an agent that learns a multimodal world model to predict future text and image representations.
arXiv Detail & Related papers (2023-07-31T17:57:49Z)
BabySLM: language-acquisition-friendly benchmark of self-supervised spoken language models [56.93604813379634]
Self-supervised techniques for learning speech representations have been shown to develop linguistic competence from exposure to speech without the need for human labels. We propose a language-acquisition-friendly benchmark to probe spoken language models at the lexical and syntactic levels. We highlight two exciting challenges that need to be addressed for further progress: bridging the gap between text and speech and between clean speech and in-the-wild speech.
arXiv Detail & Related papers (2023-06-02T12:54:38Z)
Language-Driven Representation Learning for Robotics [115.93273609767145]
Recent work in visual representation learning for robotics demonstrates the viability of learning from large video datasets of humans performing everyday tasks. We introduce a framework for language-driven representation learning from human videos and captions. We find that Voltron's language-driven learning outperform the prior-of-the-art, especially on targeted problems requiring higher-level control.
arXiv Detail & Related papers (2023-02-24T17:29:31Z)
What Artificial Neural Networks Can Tell Us About Human Language Acquisition [47.761188531404066]
Rapid progress in machine learning for natural language processing has the potential to transform debates about how humans learn language. To increase the relevance of learnability results from computational models, we need to train model learners without significant advantages over humans.
arXiv Detail & Related papers (2022-08-17T00:12:37Z)
Explainable Semantic Space by Grounding Language to Vision with Cross-Modal Contrastive Learning [3.441021278275805]
We design a two-stream model for grounding language learning in vision. The model first learns to align visual and language representations with the MS COCO dataset. After training, the language stream of this model is a stand-alone language model capable of embedding concepts in a visually grounded semantic space.
arXiv Detail & Related papers (2021-11-13T19:54:15Z)
Word Acquisition in Neural Language Models [0.38073142980733]
We investigate how neural language models acquire individual words during training, extracting learning curves and ages of acquisition for over 600 words. We find that the effects of concreteness, word length, and lexical class are pointedly different in children and language models.
arXiv Detail & Related papers (2021-10-05T23:26:16Z)
A Visuospatial Dataset for Naturalistic Verb Learning [18.654373173232205]
We introduce a new dataset for training and evaluating grounded language models. Our data is collected within a virtual reality environment and is designed to emulate the quality of language data to which a pre-verbal child is likely to have access. We use the collected data to compare several distributional semantics models for verb learning.
arXiv Detail & Related papers (2020-10-28T20:47:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.