Related papers: On Evaluating Multilingual Compositional Generalization with Translated Datasets

On Evaluating Multilingual Compositional Generalization with Translated Datasets

URL: http://arxiv.org/abs/2306.11420v1
Date: Tue, 20 Jun 2023 10:03:57 GMT
Title: On Evaluating Multilingual Compositional Generalization with Translated Datasets
Authors: Zi Wang and Daniel Hershcovich
Abstract summary: We show that compositional generalization abilities differ across languages. We craft a faithful rule-based translation of the MCWQ dataset from English to Chinese and Japanese. Even with the resulting robust benchmark, which we call MCWQ-R, we show that the distribution of compositions still suffers due to linguistic divergences.
Score: 34.51457321680049
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Compositional generalization allows efficient learning and human-like inductive biases. Since most research investigating compositional generalization in NLP is done on English, important questions remain underexplored. Do the necessary compositional generalization abilities differ across languages? Can models compositionally generalize cross-lingually? As a first step to answering these questions, recent work used neural machine translation to translate datasets for evaluating compositional generalization in semantic parsing. However, we show that this entails critical semantic distortion. To address this limitation, we craft a faithful rule-based translation of the MCWQ dataset from English to Chinese and Japanese. Even with the resulting robust benchmark, which we call MCWQ-R, we show that the distribution of compositions still suffers due to linguistic divergences, and that multilingual models still struggle with cross-lingual compositional generalization. Our dataset and methodology will be useful resources for the study of cross-lingual compositional generalization in other tasks.

Related papers

Evaluating Structural Generalization in Neural Machine Translation [13.880151307013318]
We construct SGET, a dataset covering various types of compositional generalization with control of words and sentence structures. We show that neural machine translation models struggle more in structural generalization than in lexical generalization. We also find different performance trends in semantic parsing and machine translation, which indicates the importance of evaluations across various tasks.
arXiv Detail & Related papers (2024-06-19T09:09:11Z)
DiNeR: a Large Realistic Dataset for Evaluating Compositional Generalization [30.05945103235578]
We propose the DIsh NamE Recognition (DiNeR) task and create a large realistic Chinese dataset. Given a recipe instruction, models are required to recognize the dish name composed of diverse combinations of food, actions, and flavors. Our dataset consists of 3,811 dishes and 228,114 recipes, and involves plenty of linguistic phenomena such as anaphora, omission and ambiguity.
arXiv Detail & Related papers (2024-06-07T06:35:21Z)
On Using Distribution-Based Compositionality Assessment to Evaluate Compositional Generalisation in Machine Translation [10.840893953881652]
It is important to develop benchmarks to assess compositional generalisation in real-world natural language tasks. This is done by splitting the Europarl translation corpus into a training and a test set in such a way that the test set requires compositional generalisation capacity. This is a fully-automated procedure to create natural language compositionality benchmarks, making it simple and inexpensive to apply it further to other datasets and languages.
arXiv Detail & Related papers (2023-11-14T15:37:19Z)
How Do In-Context Examples Affect Compositional Generalization? [86.57079616209474]
In this paper, we present CoFe, a test suite to investigate in-context compositional generalization. We find that the compositional generalization performance can be easily affected by the selection of in-context examples. Our systematic experiments indicate that in-context examples should be structurally similar to the test case, diverse from each other, and individually simple.
arXiv Detail & Related papers (2023-05-08T16:32:18Z)
It's All in the Heads: Using Attention Heads as a Baseline for Cross-Lingual Transfer in Commonsense Reasoning [4.200736775540874]
We design a simple approach to commonsense reasoning which trains a linear classifier with weights of multi-head attention as features. The method performs competitively with recent supervised and unsupervised approaches for commonsense reasoning. Most of the performance is given by the same small subset of attention heads for all studied languages.
arXiv Detail & Related papers (2021-06-22T21:25:43Z)
Evaluating Multilingual Text Encoders for Unsupervised Cross-Lingual Retrieval [51.60862829942932]
We present a systematic empirical study focused on the suitability of the state-of-the-art multilingual encoders for cross-lingual document and sentence retrieval tasks. For sentence-level CLIR, we demonstrate that state-of-the-art performance can be achieved. However, the peak performance is not met using the general-purpose multilingual text encoders off-the-shelf', but rather relying on their variants that have been further specialized for sentence understanding tasks.
arXiv Detail & Related papers (2021-01-21T00:15:38Z)
Compositional Generalization and Natural Language Variation: Can a Semantic Parsing Approach Handle Both? [27.590858384414567]
We ask: can we develop a semantic parsing approach that handles both natural language variation and compositional generalization? We propose new train and test splits of non-synthetic datasets to better assess this capability. We also propose NQG-T5, a hybrid model that combines a high-precision grammar-based approach with a pre-trained sequence-to-sequence model.
arXiv Detail & Related papers (2020-10-24T00:38:27Z)
XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning [68.57658225995966]
Cross-lingual Choice of Plausible Alternatives (XCOPA) is a typologically diverse multilingual dataset for causal commonsense reasoning in 11 languages. We evaluate a range of state-of-the-art models on this novel dataset, revealing that the performance of current methods falls short compared to translation-based transfer.
arXiv Detail & Related papers (2020-05-01T12:22:33Z)
On the Language Neutrality of Pre-trained Multilingual Representations [70.93503607755055]
We investigate the language-neutrality of multilingual contextual embeddings directly and with respect to lexical semantics. Our results show that contextual embeddings are more language-neutral and, in general, more informative than aligned static word-type embeddings. We show how to reach state-of-the-art accuracy on language identification and match the performance of statistical methods for word alignment of parallel sentences.
arXiv Detail & Related papers (2020-04-09T19:50:32Z)
A Benchmark for Systematic Generalization in Grounded Language Understanding [61.432407738682635]
Humans easily interpret expressions that describe unfamiliar situations composed from familiar parts. Modern neural networks, by contrast, struggle to interpret novel compositions. We introduce a new benchmark, gSCAN, for evaluating compositional generalization in situated language understanding.
arXiv Detail & Related papers (2020-03-11T08:40:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.