MEWL: Few-shot multimodal word learning with referential uncertainty
- URL: http://arxiv.org/abs/2306.00503v1
- Date: Thu, 1 Jun 2023 09:54:31 GMT
- Title: MEWL: Few-shot multimodal word learning with referential uncertainty
- Authors: Guangyuan Jiang, Manjie Xu, Shiji Xin, Wei Liang, Yujia Peng, Chi
Zhang, Yixin Zhu
- Abstract summary: We introduce the MachinE Word Learning benchmark to assess how machines learn word meaning in grounded visual scenes.
MEWL covers human's core cognitive toolkits in word learning: cross-situational reasoning, bootstrapping, and pragmatic learning.
By evaluating multimodal and unimodal agents' performance with a comparative analysis of human performance, we notice a sharp divergence in human and machine word learning.
- Score: 24.94171567232573
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Without explicit feedback, humans can rapidly learn the meaning of words.
Children can acquire a new word after just a few passive exposures, a process
known as fast mapping. This word learning capability is believed to be the most
fundamental building block of multimodal understanding and reasoning. Despite
recent advancements in multimodal learning, a systematic and rigorous
evaluation is still missing for human-like word learning in machines. To fill
in this gap, we introduce the MachinE Word Learning (MEWL) benchmark to assess
how machines learn word meaning in grounded visual scenes. MEWL covers human's
core cognitive toolkits in word learning: cross-situational reasoning,
bootstrapping, and pragmatic learning. Specifically, MEWL is a few-shot
benchmark suite consisting of nine tasks for probing various word learning
capabilities. These tasks are carefully designed to be aligned with the
children's core abilities in word learning and echo the theories in the
developmental literature. By evaluating multimodal and unimodal agents'
performance with a comparative analysis of human performance, we notice a sharp
divergence in human and machine word learning. We further discuss these
differences between humans and machines and call for human-like few-shot word
learning in machines.
Related papers
- Exploring Automated Keyword Mnemonics Generation with Large Language Models via Overgenerate-and-Rank [4.383205675898942]
Keywords mnemonics are a technique for memorizing vocabulary through memorable associations with a target word via a verbal cue.
We propose a novel overgenerate-and-rank method via prompting large language models to generate verbal cues.
Results show that LLM-generated mnemonics are comparable to human-generated ones in terms of imageability, coherence, and perceived usefulness.
arXiv Detail & Related papers (2024-09-21T00:00:18Z) - Visual Grounding Helps Learn Word Meanings in Low-Data Regimes [47.7950860342515]
Modern neural language models (LMs) are powerful tools for modeling human sentence production and comprehension.
But to achieve these results, LMs must be trained in distinctly un-human-like ways.
Do models trained more naturalistically -- with grounded supervision -- exhibit more humanlike language learning?
We investigate this question in the context of word learning, a key sub-task in language acquisition.
arXiv Detail & Related papers (2023-10-20T03:33:36Z) - Storyfier: Exploring Vocabulary Learning Support with Text Generation
Models [52.58844741797822]
We develop Storyfier to provide a coherent context for any target words of learners' interests.
learners generally favor the generated stories for connecting target words and writing assistance for easing their learning workload.
In read-cloze-write learning sessions, participants using Storyfier perform worse in recalling and using target words than learning with a baseline tool without our AI features.
arXiv Detail & Related papers (2023-08-07T18:25:00Z) - Human Inspired Progressive Alignment and Comparative Learning for
Grounded Word Acquisition [6.47452771256903]
We take inspiration from how human babies acquire their first language, and developed a computational process for word acquisition through comparative learning.
Motivated by cognitive findings, we generated a small dataset that enables the computation models to compare the similarities and differences of various attributes.
We frame the acquisition of words as not only the information filtration process, but also as representation-symbol mapping.
arXiv Detail & Related papers (2023-07-05T19:38:04Z) - Computational Language Acquisition with Theory of Mind [84.2267302901888]
We build language-learning agents equipped with Theory of Mind (ToM) and measure its effects on the learning process.
We find that training speakers with a highly weighted ToM listener component leads to performance gains in our image referential game setting.
arXiv Detail & Related papers (2023-03-02T18:59:46Z) - What Artificial Neural Networks Can Tell Us About Human Language
Acquisition [47.761188531404066]
Rapid progress in machine learning for natural language processing has the potential to transform debates about how humans learn language.
To increase the relevance of learnability results from computational models, we need to train model learners without significant advantages over humans.
arXiv Detail & Related papers (2022-08-17T00:12:37Z) - Predicting Word Learning in Children from the Performance of Computer
Vision Systems [24.49899952381515]
We show that the age at which children acquire different categories of words is correlated with the performance of visual classification and captioning systems.
The performance of the computer vision systems is correlated with human judgments of the concreteness of words, which are in turn a predictor of children's word learning.
arXiv Detail & Related papers (2022-07-07T22:49:32Z) - My Teacher Thinks The World Is Flat! Interpreting Automatic Essay
Scoring Mechanism [71.34160809068996]
Recent work shows that automated scoring systems are prone to even common-sense adversarial samples.
We utilize recent advances in interpretability to find the extent to which features such as coherence, content and relevance are important for automated scoring mechanisms.
We also find that since the models are not semantically grounded with world-knowledge and common sense, adding false facts such as the world is flat'' actually increases the score instead of decreasing it.
arXiv Detail & Related papers (2020-12-27T06:19:20Z) - Bongard-LOGO: A New Benchmark for Human-Level Concept Learning and
Reasoning [78.13740873213223]
Bongard problems (BPs) were introduced as an inspirational challenge for visual cognition in intelligent systems.
We propose a new benchmark Bongard-LOGO for human-level concept learning and reasoning.
arXiv Detail & Related papers (2020-10-02T03:19:46Z) - Using Known Words to Learn More Words: A Distributional Analysis of
Child Vocabulary Development [0.0]
We investigated item-based variability in vocabulary development using lexical properties of distributional statistics.
We predicted word trajectories cross-sectionally, shedding light on trends in vocabulary development that may not have been evident at a single time point.
We also show that whether one looks at a single age group or across ages as a whole, the best distributional predictor of whether a child knows a word is the number of other known words with which that word tends to co-occur.
arXiv Detail & Related papers (2020-09-15T01:18:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.