Unsupervised Parsing via Constituency Tests
- URL: http://arxiv.org/abs/2010.03146v1
- Date: Wed, 7 Oct 2020 04:05:01 GMT
- Title: Unsupervised Parsing via Constituency Tests
- Authors: Steven Cao, Nikita Kitaev, Dan Klein
- Abstract summary: We propose a method for unsupervised parsing based on the linguistic notion of a constituency test.
To produce a tree given a sentence, we score each span by aggregating its constituency test judgments, and we choose the binary tree with the highest total score.
The refined model achieves 62.8 F1 on the Penn Treebank test set, an absolute improvement of 7.6 points over the previous best published result.
- Score: 49.42244463346612
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a method for unsupervised parsing based on the linguistic notion
of a constituency test. One type of constituency test involves modifying the
sentence via some transformation (e.g. replacing the span with a pronoun) and
then judging the result (e.g. checking if it is grammatical). Motivated by this
idea, we design an unsupervised parser by specifying a set of transformations
and using an unsupervised neural acceptability model to make grammaticality
decisions. To produce a tree given a sentence, we score each span by
aggregating its constituency test judgments, and we choose the binary tree with
the highest total score. While this approach already achieves performance in
the range of current methods, we further improve accuracy by fine-tuning the
grammaticality model through a refinement procedure, where we alternate between
improving the estimated trees and improving the grammaticality model. The
refined model achieves 62.8 F1 on the Penn Treebank test set, an absolute
improvement of 7.6 points over the previous best published result.
Related papers
- Contextual Distortion Reveals Constituency: Masked Language Models are
Implicit Parsers [7.558415495951758]
We propose a novel method for extracting parse trees from masked language models (LMs)
Our method computes a score for each span based on the distortion of contextual representations resulting from linguistic perturbations.
Our method consistently outperforms previous state-of-the-art methods on English with masked LMs, and also demonstrates superior performance in a multilingual setting.
arXiv Detail & Related papers (2023-06-01T13:10:48Z) - Hierarchical Phrase-based Sequence-to-Sequence Learning [94.10257313923478]
We describe a neural transducer that maintains the flexibility of standard sequence-to-sequence (seq2seq) models while incorporating hierarchical phrases as a source of inductive bias during training and as explicit constraints during inference.
Our approach trains two models: a discriminative derivation based on a bracketing grammar whose tree hierarchically aligns source and target phrases, and a neural seq2seq model that learns to translate the aligned phrases one-by-one.
arXiv Detail & Related papers (2022-11-15T05:22:40Z) - On Parsing as Tagging [66.31276017088477]
We show how to reduce tetratagging, a state-of-the-art constituency tagger, to shift--reduce parsing.
We empirically evaluate our taxonomy of tagging pipelines with different choices of linearizers, learners, and decoders.
arXiv Detail & Related papers (2022-11-14T13:37:07Z) - CPTAM: Constituency Parse Tree Aggregation Method [6.011216641982612]
This paper adopts the truth discovery idea to aggregate constituency parse trees from different distances.
We formulate the constituency parse tree aggregation problem in two steps, structure aggregation and constituent label aggregation.
Experiments are conducted on benchmark datasets in different languages and domains.
arXiv Detail & Related papers (2022-01-19T23:05:37Z) - Focused Contrastive Training for Test-based Constituency Analysis [7.312581661832785]
We propose a scheme for self-training of grammaticality models for constituency analysis based on linguistic tests.
A pre-trained language model is fine-tuned by contrastive estimation of grammatical sentences from a corpus, and ungrammatical sentences that were perturbed by a syntactic test.
arXiv Detail & Related papers (2021-09-30T14:22:15Z) - Constructing Taxonomies from Pretrained Language Models [52.53846972667636]
We present a method for constructing taxonomic trees (e.g., WordNet) using pretrained language models.
Our approach is composed of two modules, one that predicts parenthood relations and another that reconciles those predictions into trees.
We train our model on subtrees sampled from WordNet, and test on non-overlapping WordNet subtrees.
arXiv Detail & Related papers (2020-10-24T07:16:21Z) - Efficient Constituency Parsing by Pointing [21.395573911155495]
We propose a novel constituency parsing model that casts the parsing problem into a series of pointing tasks.
Our model supports efficient top-down decoding and our learning objective is able to enforce structural consistency without resorting to the expensive CKY inference.
arXiv Detail & Related papers (2020-06-24T08:29:09Z) - Towards Instance-Level Parser Selection for Cross-Lingual Transfer of
Dependency Parsers [59.345145623931636]
We argue for a novel cross-lingual transfer paradigm: instance-level selection (ILPS)
We present a proof-of-concept study focused on instance-level selection in the framework of delexicalized transfer.
arXiv Detail & Related papers (2020-04-16T13:18:55Z) - Pseudo-Convolutional Policy Gradient for Sequence-to-Sequence
Lip-Reading [96.48553941812366]
Lip-reading aims to infer the speech content from the lip movement sequence.
Traditional learning process of seq2seq models suffers from two problems.
We propose a novel pseudo-convolutional policy gradient (PCPG) based method to address these two problems.
arXiv Detail & Related papers (2020-03-09T09:12:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.