Blackbird's language matrices (BLMs): a new benchmark to investigate
disentangled generalisation in neural networks
- URL: http://arxiv.org/abs/2205.10866v1
- Date: Sun, 22 May 2022 16:51:24 GMT
- Title: Blackbird's language matrices (BLMs): a new benchmark to investigate
disentangled generalisation in neural networks
- Authors: Paola Merlo, Aixiu An and Maria A. Rodriguez
- Abstract summary: We illustrate Blackbird's language matrices (BLMs), a novel grammatical dataset developed to test a linguistic variant of Raven's progressive matrices.
The dataset consists of 44800 sentences, generatively constructed to support investigations of current models' linguistic mastery of grammatical agreement rules.
We show that this language task and the data that instantiate it provide a new challenging testbed to understand generalisation and abstraction.
- Score: 2.5567566997688034
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Current successes of machine learning architectures are based on
computationally expensive algorithms and prohibitively large amounts of data.
We need to develop tasks and data to train networks to reach more complex and
more compositional skills. In this paper, we illustrate Blackbird's language
matrices (BLMs), a novel grammatical dataset developed to test a linguistic
variant of Raven's progressive matrices, an intelligence test usually based on
visual stimuli. The dataset consists of 44800 sentences, generatively
constructed to support investigations of current models' linguistic mastery of
grammatical agreement rules and their ability to generalise them. We present
the logic of the dataset, the method to automatically construct data on a large
scale and the architecture to learn them. Through error analysis and several
experiments on variations of the dataset, we demonstrate that this language
task and the data that instantiate it provide a new challenging testbed to
understand generalisation and abstraction.
Related papers
- Training Neural Networks as Recognizers of Formal Languages [87.06906286950438]
Formal language theory pertains specifically to recognizers.
It is common to instead use proxy tasks that are similar in only an informal sense.
We correct this mismatch by training and evaluating neural networks directly as binary classifiers of strings.
arXiv Detail & Related papers (2024-11-11T16:33:25Z) - BabelBench: An Omni Benchmark for Code-Driven Analysis of Multimodal and Multistructured Data [61.936320820180875]
Large language models (LLMs) have become increasingly pivotal across various domains.
BabelBench is an innovative benchmark framework that evaluates the proficiency of LLMs in managing multimodal multistructured data with code execution.
Our experimental findings on BabelBench indicate that even cutting-edge models like ChatGPT 4 exhibit substantial room for improvement.
arXiv Detail & Related papers (2024-10-01T15:11:24Z) - Knowledge-Aware Reasoning over Multimodal Semi-structured Tables [85.24395216111462]
This study investigates whether current AI models can perform knowledge-aware reasoning on multimodal structured data.
We introduce MMTabQA, a new dataset designed for this purpose.
Our experiments highlight substantial challenges for current AI models in effectively integrating and interpreting multiple text and image inputs.
arXiv Detail & Related papers (2024-08-25T15:17:43Z) - Language Modeling on Tabular Data: A Survey of Foundations, Techniques and Evolution [7.681258910515419]
Tabular data presents unique challenges due to its heterogeneous nature and complex structural relationships.
High predictive performance and robustness in tabular data analysis holds significant promise for numerous applications.
The recent advent of large language models, such as GPT and LLaMA, has further revolutionized the field, facilitating more advanced and diverse applications with minimal fine-tuning.
arXiv Detail & Related papers (2024-08-20T04:59:19Z) - DataAgent: Evaluating Large Language Models' Ability to Answer Zero-Shot, Natural Language Queries [0.0]
We evaluate OpenAI's GPT-3.5 as a "Language Data Scientist" (LDS)
The model was tested on a diverse set of benchmark datasets to evaluate its performance across multiple standards.
arXiv Detail & Related papers (2024-03-29T22:59:34Z) - The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants [80.4837840962273]
We present Belebele, a dataset spanning 122 language variants.
This dataset enables the evaluation of text models in high-, medium-, and low-resource languages.
arXiv Detail & Related papers (2023-08-31T17:43:08Z) - Assessing Linguistic Generalisation in Language Models: A Dataset for
Brazilian Portuguese [4.941630596191806]
We propose a set of intrinsic evaluation tasks that inspect the linguistic information encoded in models developed for Brazilian Portuguese.
These tasks are designed to evaluate how different language models generalise information related to grammatical structures and multiword expressions.
arXiv Detail & Related papers (2023-05-23T13:49:14Z) - Improving Classifier Training Efficiency for Automatic Cyberbullying
Detection with Feature Density [58.64907136562178]
We study the effectiveness of Feature Density (FD) using different linguistically-backed feature preprocessing methods.
We hypothesise that estimating dataset complexity allows for the reduction of the number of required experiments.
The difference in linguistic complexity of datasets allows us to additionally discuss the efficacy of linguistically-backed word preprocessing.
arXiv Detail & Related papers (2021-11-02T15:48:28Z) - Learning Contextual Representations for Semantic Parsing with
Generation-Augmented Pre-Training [86.91380874390778]
We present Generation-Augmented Pre-training (GAP), that jointly learns representations of natural language utterances and table schemas by leveraging generation models to generate pre-train data.
Based on experimental results, neural semantics that leverage GAP MODEL obtain new state-of-the-art results on both SPIDER and CRITERIA-TO-generative benchmarks.
arXiv Detail & Related papers (2020-12-18T15:53:50Z) - Russian Natural Language Generation: Creation of a Language Modelling
Dataset and Evaluation with Modern Neural Architectures [0.0]
We provide a novel reference dataset for Russian language modeling.
We experiment with popular modern methods for text generation, namely variational autoencoders, and generative adversarial networks.
We evaluate the generated text regarding metrics such as perplexity, grammatical correctness and lexical diversity.
arXiv Detail & Related papers (2020-05-05T20:20:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.