Schr\"odinger's Tree -- On Syntax and Neural Language Models
- URL: http://arxiv.org/abs/2110.08887v1
- Date: Sun, 17 Oct 2021 18:25:23 GMT
- Title: Schr\"odinger's Tree -- On Syntax and Neural Language Models
- Authors: Artur Kulmizev, Joakim Nivre
- Abstract summary: Language models have emerged as NLP's workhorse, displaying increasingly fluent generation capabilities.
We observe a lack of clarity across numerous dimensions, which influences the hypotheses that researchers form.
We outline the implications of the different types of research questions exhibited in studies on syntax.
- Score: 10.296219074343785
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the last half-decade, the field of natural language processing (NLP) has
undergone two major transitions: the switch to neural networks as the primary
modeling paradigm and the homogenization of the training regime (pre-train,
then fine-tune). Amidst this process, language models have emerged as NLP's
workhorse, displaying increasingly fluent generation capabilities and proving
to be an indispensable means of knowledge transfer downstream. Due to the
otherwise opaque, black-box nature of such models, researchers have employed
aspects of linguistic theory in order to characterize their behavior. Questions
central to syntax -- the study of the hierarchical structure of language --
have factored heavily into such work, shedding invaluable insights about
models' inherent biases and their ability to make human-like generalizations.
In this paper, we attempt to take stock of this growing body of literature. In
doing so, we observe a lack of clarity across numerous dimensions, which
influences the hypotheses that researchers form, as well as the conclusions
they draw from their findings. To remedy this, we urge researchers make careful
considerations when investigating coding properties, selecting representations,
and evaluating via downstream tasks. Furthermore, we outline the implications
of the different types of research questions exhibited in studies on syntax, as
well as the inherent pitfalls of aggregate metrics. Ultimately, we hope that
our discussion adds nuance to the prospect of studying language models and
paves the way for a less monolithic perspective on syntax in this context.
Related papers
- Analysis of Argument Structure Constructions in a Deep Recurrent Language Model [0.0]
We explore the representation and processing of Argument Structure Constructions (ASCs) in a recurrent neural language model.
Our results show that sentence representations form distinct clusters corresponding to the four ASCs across all hidden layers.
This indicates that even a relatively simple, brain-constrained recurrent neural network can effectively differentiate between various construction types.
arXiv Detail & Related papers (2024-08-06T09:27:41Z) - Holmes: A Benchmark to Assess the Linguistic Competence of Language Models [59.627729608055006]
We introduce Holmes, a new benchmark designed to assess language models (LMs) linguistic competence.
We use computation-based probing to examine LMs' internal representations regarding distinct linguistic phenomena.
As a result, we meet recent calls to disentangle LMs' linguistic competence from other cognitive abilities.
arXiv Detail & Related papers (2024-04-29T17:58:36Z) - Agentivit\`a e telicit\`a in GilBERTo: implicazioni cognitive [77.71680953280436]
The goal of this study is to investigate whether a Transformer-based neural language model infers lexical semantics.
The semantic properties considered are telicity (also combined with definiteness) and agentivity.
arXiv Detail & Related papers (2023-07-06T10:52:22Z) - Feature Interactions Reveal Linguistic Structure in Language Models [2.0178765779788495]
We study feature interactions in the context of feature attribution methods for post-hoc interpretability.
We work out a grey box methodology, in which we train models to perfection on a formal language classification task.
We show that under specific configurations, some methods are indeed able to uncover the grammatical rules acquired by a model.
arXiv Detail & Related papers (2023-06-21T11:24:41Z) - Transparency Helps Reveal When Language Models Learn Meaning [71.96920839263457]
Our systematic experiments with synthetic data reveal that, with languages where all expressions have context-independent denotations, both autoregressive and masked language models learn to emulate semantic relations between expressions.
Turning to natural language, our experiments with a specific phenomenon -- referential opacity -- add to the growing body of evidence that current language models do not well-represent natural language semantics.
arXiv Detail & Related papers (2022-10-14T02:35:19Z) - Integrating Linguistic Theory and Neural Language Models [2.870517198186329]
I present several case studies to illustrate how theoretical linguistics and neural language models are still relevant to each other.
This thesis contributes three studies that explore different aspects of the syntax-semantics interface in language models.
arXiv Detail & Related papers (2022-07-20T04:20:46Z) - A Latent-Variable Model for Intrinsic Probing [93.62808331764072]
We propose a novel latent-variable formulation for constructing intrinsic probes.
We find empirical evidence that pre-trained representations develop a cross-lingually entangled notion of morphosyntax.
arXiv Detail & Related papers (2022-01-20T15:01:12Z) - The Rediscovery Hypothesis: Language Models Need to Meet Linguistics [8.293055016429863]
We study whether linguistic knowledge is a necessary condition for good performance of modern language models.
We show that language models that are significantly compressed but perform well on their pretraining objectives retain good scores when probed for linguistic structures.
This result supports the rediscovery hypothesis and leads to the second contribution of our paper: an information-theoretic framework that relates language modeling objective with linguistic information.
arXiv Detail & Related papers (2021-03-02T15:57:39Z) - Infusing Finetuning with Semantic Dependencies [62.37697048781823]
We show that, unlike syntax, semantics is not brought to the surface by today's pretrained models.
We then use convolutional graph encoders to explicitly incorporate semantic parses into task-specific finetuning.
arXiv Detail & Related papers (2020-12-10T01:27:24Z) - Information-Theoretic Probing for Linguistic Structure [74.04862204427944]
We propose an information-theoretic operationalization of probing as estimating mutual information.
We evaluate on a set of ten typologically diverse languages often underrepresented in NLP research.
arXiv Detail & Related papers (2020-04-07T01:06:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.