Related papers: The Grammar of Emergent Languages

The Grammar of Emergent Languages

URL: http://arxiv.org/abs/2010.02069v2
Date: Fri, 9 Oct 2020 17:52:45 GMT
Title: The Grammar of Emergent Languages
Authors: Oskar van der Wal, Silvan de Boer, Elia Bruni and Dieuwke Hupkes
Abstract summary: We show that UGI techniques are appropriate to analyse emergent languages. We then study if the languages that emerge in a typical referential game setup exhibit syntactic structure. Our experiments demonstrate that a certain message length and vocabulary size are required for structure to emerge.
Score: 19.17358904009426
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we consider the syntactic properties of languages emerged in referential games, using unsupervised grammar induction (UGI) techniques originally designed to analyse natural language. We show that the considered UGI techniques are appropriate to analyse emergent languages and we then study if the languages that emerge in a typical referential game setup exhibit syntactic structure, and to what extent this depends on the maximum message length and number of symbols that the agents are allowed to use. Our experiments demonstrate that a certain message length and vocabulary size are required for structure to emerge, but they also illustrate that more sophisticated game scenarios are required to obtain syntactic properties more akin to those observed in human language. We argue that UGI techniques should be part of the standard toolkit for analysing emergent languages and release a comprehensive library to facilitate such analysis for future researchers.

Related papers

Infusing Prompts with Syntax and Semantics [0.0]
We analyze the effect of directly infusing various kinds of syntactic and semantic information into large language models. We show that linguistic analysis can significantly boost language models, to the point that we have surpassed previous best systems.
arXiv Detail & Related papers (2024-12-08T23:49:38Z)
Finding Structure in Language Models [3.882018118763685]
This thesis is about whether language models possess a deep understanding of grammatical structure similar to that of humans. We will develop novel interpretability techniques that enhance our understanding of the complex nature of large-scale language models.
arXiv Detail & Related papers (2024-11-25T14:37:24Z)
Exploring syntactic information in sentence embeddings through multilingual subject-verb agreement [1.4335183427838039]
We take the approach of developing curated synthetic data on a large scale, with specific properties. We use a new multiple-choice task and datasets, Blackbird Language Matrices, to focus on a specific grammatical structural phenomenon. We show that despite having been trained on multilingual texts in a consistent manner, multilingual pretrained language models have language-specific differences.
arXiv Detail & Related papers (2024-09-10T14:58:55Z)
ELCC: the Emergent Language Corpus Collection [1.6574413179773761]
The Emergent Language Corpus Collection (ELCC) is a collection of corpora collected from open source implementations of emergent communication systems. Each corpus is annotated with metadata describing the characteristics of the source system as well as a suite of analyses of the corpus.
arXiv Detail & Related papers (2024-07-04T21:23:18Z)
MACT: Model-Agnostic Cross-Lingual Training for Discourse Representation Structure Parsing [4.536003573070846]
We introduce a cross-lingual training strategy for semantic representation parsing models. It exploits the alignments between languages encoded in pre-trained language models. Experiments show significant improvements in DRS clause and graph parsing in English, German, Italian and Dutch.
arXiv Detail & Related papers (2024-06-03T07:02:57Z)
SenteCon: Leveraging Lexicons to Learn Human-Interpretable Language Representations [51.08119762844217]
SenteCon is a method for introducing human interpretability in deep language representations. We show that SenteCon provides high-level interpretability at little to no cost to predictive performance on downstream tasks.
arXiv Detail & Related papers (2023-05-24T05:06:28Z)
Natural Language Decompositions of Implicit Content Enable Better Text Representations [56.85319224208865]
We introduce a method for the analysis of text that takes implicitly communicated content explicitly into account. We use a large language model to produce sets of propositions that are inferentially related to the text that has been observed. Our results suggest that modeling the meanings behind observed language, rather than the literal text alone, is a valuable direction for NLP.
arXiv Detail & Related papers (2023-05-23T23:45:20Z)
CLSE: Corpus of Linguistically Significant Entities [58.29901964387952]
We release a Corpus of Linguistically Significant Entities (CLSE) annotated by experts. CLSE covers 74 different semantic types to support various applications from airline ticketing to video games. We create a linguistically representative NLG evaluation benchmark in three languages: French, Marathi, and Russian.
arXiv Detail & Related papers (2022-11-04T12:56:12Z)
An Empirical Revisiting of Linguistic Knowledge Fusion in Language Understanding Tasks [33.765874588342285]
Infusing language models with syntactic or semantic knowledge from structural linguistic priors has shown improvements on many language understanding tasks. We conduct empirical study of replacing parsed graphs or trees with trivial ones for tasks in the GLUE benchmark. It reveals that the gains might not be significantly attributed to explicit linguistic priors but rather to more feature interactions brought by fusion layers.
arXiv Detail & Related papers (2022-10-24T07:47:32Z)
Dependency Induction Through the Lens of Visual Perception [81.91502968815746]
We propose an unsupervised grammar induction model that leverages word concreteness and a structural vision-based to jointly learn constituency-structure and dependency-structure grammars. Our experiments show that the proposed extension outperforms the current state-of-the-art visually grounded models in constituency parsing even with a smaller grammar size.
arXiv Detail & Related papers (2021-09-20T18:40:37Z)
A Massively Multilingual Analysis of Cross-linguality in Shared Embedding Space [61.18554842370824]
In cross-lingual language models, representations for many different languages live in the same space. We compute a task-based measure of cross-lingual alignment in the form of bitext retrieval performance. We examine a range of linguistic, quasi-linguistic, and training-related features as potential predictors of these alignment metrics.
arXiv Detail & Related papers (2021-09-13T21:05:37Z)
Constrained Language Models Yield Few-Shot Semantic Parsers [73.50960967598654]
We explore the use of large pretrained language models as few-shot semantics. The goal in semantic parsing is to generate a structured meaning representation given a natural language input. We use language models to paraphrase inputs into a controlled sublanguage resembling English that can be automatically mapped to a target meaning representation.
arXiv Detail & Related papers (2021-04-18T08:13:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.