COGS: A Compositional Generalization Challenge Based on Semantic
Interpretation
- URL: http://arxiv.org/abs/2010.05465v1
- Date: Mon, 12 Oct 2020 05:45:44 GMT
- Title: COGS: A Compositional Generalization Challenge Based on Semantic
Interpretation
- Authors: Najoung Kim and Tal Linzen
- Abstract summary: We introduce COGS, a semantic parsing dataset based on a fragment of English.
The evaluation portion of COGS contains multiple systematic gaps that can only be addressed by compositional generalization.
In experiments with Transformers and LSTMs, we found that in-distribution accuracy on the COGS test set was near-perfect (96--99%), but generalization accuracy was substantially lower (16--35%) and showed high sensitivity to random seed.
- Score: 27.89019245763459
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Natural language is characterized by compositionality: the meaning of a
complex expression is constructed from the meanings of its constituent parts.
To facilitate the evaluation of the compositional abilities of language
processing architectures, we introduce COGS, a semantic parsing dataset based
on a fragment of English. The evaluation portion of COGS contains multiple
systematic gaps that can only be addressed by compositional generalization;
these include new combinations of familiar syntactic structures, or new
combinations of familiar words and familiar structures. In experiments with
Transformers and LSTMs, we found that in-distribution accuracy on the COGS test
set was near-perfect (96--99%), but generalization accuracy was substantially
lower (16--35%) and showed high sensitivity to random seed ($\pm$6--8%). These
findings indicate that contemporary standard NLP models are limited in their
compositional generalization capacity, and position COGS as a good way to
measure progress.
Related papers
- Consistency of Compositional Generalization across Multiple Levels [31.77432446850103]
We propose a meta-learning based framework, for achieving consistent compositional generalization across multiple levels.
We build a GQA-CCG dataset to quantitatively evaluate the consistency.
arXiv Detail & Related papers (2024-12-18T09:09:41Z) - On Using Distribution-Based Compositionality Assessment to Evaluate
Compositional Generalisation in Machine Translation [10.840893953881652]
It is important to develop benchmarks to assess compositional generalisation in real-world natural language tasks.
This is done by splitting the Europarl translation corpus into a training and a test set in such a way that the test set requires compositional generalisation capacity.
This is a fully-automated procedure to create natural language compositionality benchmarks, making it simple and inexpensive to apply it further to other datasets and languages.
arXiv Detail & Related papers (2023-11-14T15:37:19Z) - SLOG: A Structural Generalization Benchmark for Semantic Parsing [68.19511282584304]
The goal of compositional generalization benchmarks is to evaluate how well models generalize to new complex linguistic expressions.
Existing benchmarks often focus on lexical generalization, the interpretation of novel lexical items in syntactic structures familiar from training, are often underrepresented.
We introduce SLOG, a semantic parsing dataset that extends COGS with 17 structural generalization cases.
arXiv Detail & Related papers (2023-10-23T15:39:09Z) - Categorizing Semantic Representations for Neural Machine Translation [53.88794787958174]
We introduce categorization to the source contextualized representations.
The main idea is to enhance generalization by reducing sparsity and overfitting.
Experiments on a dedicated MT dataset show that our method reduces compositional generalization error rates by 24% error reduction.
arXiv Detail & Related papers (2022-10-13T04:07:08Z) - Compositional Generalization Requires Compositional Parsers [69.77216620997305]
We compare sequence-to-sequence models and models guided by compositional principles on the recent COGS corpus.
We show structural generalization is a key measure of compositional generalization and requires models that are aware of complex structure.
arXiv Detail & Related papers (2022-02-24T07:36:35Z) - UPB at SemEval-2021 Task 1: Combining Deep Learning and Hand-Crafted
Features for Lexical Complexity Prediction [0.7197592390105455]
We describe our approach for the SemEval-2021 Task 1: Lexical Complexity Prediction competition.
Our results are just 5.46% and 6.5% lower than the top scores obtained in the competition on the first and the second subtasks.
arXiv Detail & Related papers (2021-04-14T17:05:46Z) - Hierarchical Poset Decoding for Compositional Generalization in Language [52.13611501363484]
We formalize human language understanding as a structured prediction task where the output is a partially ordered set (poset)
Current encoder-decoder architectures do not take the poset structure of semantics into account properly.
We propose a novel hierarchical poset decoding paradigm for compositional generalization in language.
arXiv Detail & Related papers (2020-10-15T14:34:26Z) - Improving Compositional Generalization in Semantic Parsing [54.4720965813889]
Generalization of models to out-of-distribution (OOD) data has captured tremendous attention recently.
We investigate compositional generalization in semantic parsing, a natural test-bed for compositional generalization.
arXiv Detail & Related papers (2020-10-12T12:34:58Z) - A Benchmark for Systematic Generalization in Grounded Language
Understanding [61.432407738682635]
Humans easily interpret expressions that describe unfamiliar situations composed from familiar parts.
Modern neural networks, by contrast, struggle to interpret novel compositions.
We introduce a new benchmark, gSCAN, for evaluating compositional generalization in situated language understanding.
arXiv Detail & Related papers (2020-03-11T08:40:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.