Weakly Supervised Headline Dependency Parsing
- URL: http://arxiv.org/abs/2301.10371v1
- Date: Wed, 25 Jan 2023 01:00:16 GMT
- Title: Weakly Supervised Headline Dependency Parsing
- Authors: Adrian Benton, Tianze Shi, Ozan \.Irsoy, Igor Malioutov
- Abstract summary: English news headlines form a register with unique syntactic properties that have been documented in literature since the 1930s.
We aim to bridge this gap by providing the first news headline corpus of Universal Dependencies syntactic dependency trees.
- Score: 20.246696104447985
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: English news headlines form a register with unique syntactic properties that
have been documented in linguistics literature since the 1930s. However,
headlines have received surprisingly little attention from the NLP syntactic
parsing community. We aim to bridge this gap by providing the first news
headline corpus of Universal Dependencies annotated syntactic dependency trees,
which enables us to evaluate existing state-of-the-art dependency parsers on
news headlines. To improve English news headline parsing accuracies, we develop
a projection method to bootstrap silver training data from unlabeled news
headline-article lead sentence pairs. Models trained on silver headline parses
demonstrate significant improvements in performance over models trained solely
on gold-annotated long-form texts. Ultimately, we find that, although projected
silver training data improves parser performance across different news outlets,
the improvement is moderated by constructions idiosyncratic to outlet.
Related papers
- Urdu Dependency Parsing and Treebank Development: A Syntactic and Morphological Perspective [0.0]
We use dependency parsing to analyze news articles in Urdu.
We achieve a best-labeled accuracy (LA) of 70% and an unlabeled attachment score (UAS) of 84%.
arXiv Detail & Related papers (2024-06-13T19:30:32Z) - Optimal Transport Posterior Alignment for Cross-lingual Semantic Parsing [68.47787275021567]
Cross-lingual semantic parsing transfers parsing capability from a high-resource language (e.g., English) to low-resource languages with scarce training data.
We propose a new approach to cross-lingual semantic parsing by explicitly minimizing cross-lingual divergence between latent variables using Optimal Transport.
arXiv Detail & Related papers (2023-07-09T04:52:31Z) - Training Naturalized Semantic Parsers with Very Little Data [10.709587018625275]
State-of-the-art (SOTA) semantics are seq2seq architectures based on large language models that have been pretrained on vast amounts of text.
Recent work has explored a reformulation of semantic parsing whereby the output sequences are themselves natural language sentences.
We show that this method delivers new SOTA few-shot performance on the Overnight dataset.
arXiv Detail & Related papers (2022-04-29T17:14:54Z) - Transcribing Natural Languages for The Deaf via Neural Editing Programs [84.0592111546958]
We study the task of glossification, of which the aim is to em transcribe natural spoken language sentences for the Deaf (hard-of-hearing) community to ordered sign language glosses.
Previous sequence-to-sequence language models often fail to capture the rich connections between the two distinct languages, leading to unsatisfactory transcriptions.
We observe that despite different grammars, glosses effectively simplify sentences for the ease of deaf communication, while sharing a large portion of vocabulary with sentences.
arXiv Detail & Related papers (2021-12-17T16:21:49Z) - To Augment or Not to Augment? A Comparative Study on Text Augmentation
Techniques for Low-Resource NLP [0.0]
We investigate three categories of text augmentation methodologies which perform changes on the syntax.
We compare them on part-of-speech tagging, dependency parsing and semantic role labeling for a diverse set of language families.
Our results suggest that the augmentation techniques can further improve over strong baselines based on mBERT.
arXiv Detail & Related papers (2021-11-18T10:52:48Z) - On The Ingredients of an Effective Zero-shot Semantic Parser [95.01623036661468]
We analyze zero-shot learning by paraphrasing training examples of canonical utterances and programs from a grammar.
We propose bridging these gaps using improved grammars, stronger paraphrasers, and efficient learning methods.
Our model achieves strong performance on two semantic parsing benchmarks (Scholar, Geo) with zero labeled data.
arXiv Detail & Related papers (2021-10-15T21:41:16Z) - Dependency Induction Through the Lens of Visual Perception [81.91502968815746]
We propose an unsupervised grammar induction model that leverages word concreteness and a structural vision-based to jointly learn constituency-structure and dependency-structure grammars.
Our experiments show that the proposed extension outperforms the current state-of-the-art visually grounded models in constituency parsing even with a smaller grammar size.
arXiv Detail & Related papers (2021-09-20T18:40:37Z) - Cross-Register Projection for Headline Part of Speech Tagging [3.5455943749695034]
We train a multi-domain POS tagger on both long-form and headline text.
We show that our model yields a 23% relative error reduction per token and 19% per headline.
We make POSH, the POS-tagged Headline corpus, available to encourage research in improved NLP models for news headlines.
arXiv Detail & Related papers (2021-09-15T18:00:02Z) - Improving Cross-Lingual Reading Comprehension with Self-Training [62.73937175625953]
Current state-of-the-art models even surpass human performance on several benchmarks.
Previous works have revealed the abilities of pre-trained multilingual models for zero-shot cross-lingual reading comprehension.
This paper further utilized unlabeled data to improve the performance.
arXiv Detail & Related papers (2021-05-08T08:04:30Z) - Grounded Compositional Outputs for Adaptive Language Modeling [59.02706635250856]
A language model's vocabulary$-$typically selected before training and permanently fixed later$-$affects its size.
We propose a fully compositional output embedding layer for language models.
To our knowledge, the result is the first word-level language model with a size that does not depend on the training vocabulary.
arXiv Detail & Related papers (2020-09-24T07:21:14Z) - A Common Semantic Space for Monolingual and Cross-Lingual
Meta-Embeddings [10.871587311621974]
This paper presents a new technique for creating monolingual and cross-lingual meta-embeddings.
Existing word vectors are projected to a common semantic space using linear transformations and averaging.
The resulting cross-lingual meta-embeddings also exhibit excellent cross-lingual transfer learning capabilities.
arXiv Detail & Related papers (2020-01-17T15:42:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.