On the Branching Bias of Syntax Extracted from Pre-trained Language
Models
- URL: http://arxiv.org/abs/2010.02448v1
- Date: Tue, 6 Oct 2020 03:09:14 GMT
- Title: On the Branching Bias of Syntax Extracted from Pre-trained Language
Models
- Authors: Huayang Li, Lemao Liu, Guoping Huang, Shuming Shi
- Abstract summary: We propose quantitatively measuring the branching bias by comparing the performance gap on a language and its reversed language.
We analyze the impacts of three factors on the branching bias, namely parsing algorithms, feature definitions, and language models.
- Score: 47.82102426290707
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many efforts have been devoted to extracting constituency trees from
pre-trained language models, often proceeding in two stages: feature definition
and parsing. However, this kind of methods may suffer from the branching bias
issue, which will inflate the performances on languages with the same branch it
biases to. In this work, we propose quantitatively measuring the branching bias
by comparing the performance gap on a language and its reversed language, which
is agnostic to both language models and extracting methods. Furthermore, we
analyze the impacts of three factors on the branching bias, namely parsing
algorithms, feature definitions, and language models. Experiments show that
several existing works exhibit branching biases, and some implementations of
these three factors can introduce the branching bias.
Related papers
- ASTE Transformer Modelling Dependencies in Aspect-Sentiment Triplet Extraction [2.07180164747172]
Aspect-Sentiment Triplet Extraction (ASTE) is a recently proposed task that consists in extracting (aspect phrase, opinion phrase, sentiment polarity) triples from a given sentence.
Recent state-of-the-art methods approach this task by first extracting all possible spans from a given sentence.
arXiv Detail & Related papers (2024-09-23T16:49:47Z) - Design Choices for Crowdsourcing Implicit Discourse Relations: Revealing
the Biases Introduced by Task Design [23.632204469647526]
We show that the task design can push annotators towards certain relations.
We conclude that this type of bias should be taken into account when training and testing models.
arXiv Detail & Related papers (2023-04-03T09:04:18Z) - Beyond Contrastive Learning: A Variational Generative Model for
Multilingual Retrieval [109.62363167257664]
We propose a generative model for learning multilingual text embeddings.
Our model operates on parallel data in $N$ languages.
We evaluate this method on a suite of tasks including semantic similarity, bitext mining, and cross-lingual question retrieval.
arXiv Detail & Related papers (2022-12-21T02:41:40Z) - BenchCLAMP: A Benchmark for Evaluating Language Models on Syntactic and
Semantic Parsing [55.058258437125524]
We introduce BenchCLAMP, a Benchmark to evaluate Constrained LAnguage Model Parsing.
We benchmark eight language models, including two GPT-3 variants available only through an API.
Our experiments show that encoder-decoder pretrained language models can achieve similar performance or surpass state-of-the-art methods for syntactic and semantic parsing when the model output is constrained to be valid.
arXiv Detail & Related papers (2022-06-21T18:34:11Z) - A Latent-Variable Model for Intrinsic Probing [93.62808331764072]
We propose a novel latent-variable formulation for constructing intrinsic probes.
We find empirical evidence that pre-trained representations develop a cross-lingually entangled notion of morphosyntax.
arXiv Detail & Related papers (2022-01-20T15:01:12Z) - It's All in the Heads: Using Attention Heads as a Baseline for
Cross-Lingual Transfer in Commonsense Reasoning [4.200736775540874]
We design a simple approach to commonsense reasoning which trains a linear classifier with weights of multi-head attention as features.
The method performs competitively with recent supervised and unsupervised approaches for commonsense reasoning.
Most of the performance is given by the same small subset of attention heads for all studied languages.
arXiv Detail & Related papers (2021-06-22T21:25:43Z) - Discrete representations in neural models of spoken language [56.29049879393466]
We compare the merits of four commonly used metrics in the context of weakly supervised models of spoken language.
We find that the different evaluation metrics can give inconsistent results.
arXiv Detail & Related papers (2021-05-12T11:02:02Z) - Language Models for Lexical Inference in Context [4.581468205348204]
Lexical inference in context (LIiC) is the task of recognizing textual entailment between two very similar sentences.
We formulate and evaluate the first approaches based on pretrained language models (LMs) for this task.
All our approaches outperform the previous state of the art, showing the potential of pretrained LMs for LIiC.
arXiv Detail & Related papers (2021-02-10T09:08:22Z) - Inducing Language-Agnostic Multilingual Representations [61.97381112847459]
Cross-lingual representations have the potential to make NLP techniques available to the vast majority of languages in the world.
We examine three approaches for this: (i) re-aligning the vector spaces of target languages to a pivot source language; (ii) removing language-specific means and variances, which yields better discriminativeness of embeddings as a by-product; and (iii) increasing input similarity across languages by removing morphological contractions and sentence reordering.
arXiv Detail & Related papers (2020-08-20T17:58:56Z) - Fine-Grained Analysis of Cross-Linguistic Syntactic Divergences [18.19093600136057]
We propose a framework for extracting divergence patterns for any language pair from a parallel corpus.
We show that our framework provides a detailed picture of cross-language divergences, generalizes previous approaches, and lends itself to full automation.
arXiv Detail & Related papers (2020-05-07T13:05:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.