\AE THEL: Automatically Extracted Typelogical Derivations for Dutch
- URL: http://arxiv.org/abs/1912.12635v2
- Date: Fri, 6 Mar 2020 15:26:59 GMT
- Title: \AE THEL: Automatically Extracted Typelogical Derivations for Dutch
- Authors: Konstantinos Kogkalidis and Michael Moortgat and Richard Moot
- Abstract summary: AETHEL is a semantic compositionality for written Dutch.
AETHEL's types and derivations are obtained by means of an extraction algorithm applied to the syntactic analyses of LASSY Small.
- Score: 0.8379286663107844
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present {\AE}THEL, a semantic compositionality dataset for written Dutch.
{\AE}THEL consists of two parts. First, it contains a lexicon of supertags for
about 900 000 words in context. The supertags correspond to types of the simply
typed linear lambda-calculus, enhanced with dependency decorations that capture
grammatical roles supplementary to function-argument structures. On the basis
of these types, {\AE}THEL further provides 72 192 validated derivations,
presented in four formats: natural-deduction and sequent-style proofs, linear
logic proofnets and the associated programs (lambda terms) for meaning
composition. {\AE}THEL's types and derivations are obtained by means of an
extraction algorithm applied to the syntactic analyses of LASSY Small, the gold
standard corpus of written Dutch. We discuss the extraction algorithm and show
how `virtual elements' in the original LASSY annotation of unbounded
dependencies and coordination phenomena give rise to higher-order types. We
suggest some example usecases highlighting the benefits of a type-driven
approach at the syntax semantics interface. The following resources are
open-sourced with {\AE}THEL: the lexical mappings between words and types, a
subset of the dataset consisting of 7 924 semantic parses, and the Python code
that implements the extraction algorithm.
Related papers
- Integrating Supertag Features into Neural Discontinuous Constituent Parsing [0.0]
Traditional views of constituency demand that constituents consist of adjacent words, common in languages like German.
Transition-based parsing produces trees given raw text input using supervised learning on large annotated corpora.
arXiv Detail & Related papers (2024-10-11T12:28:26Z) - Urdu Dependency Parsing and Treebank Development: A Syntactic and Morphological Perspective [0.0]
We use dependency parsing to analyze news articles in Urdu.
We achieve a best-labeled accuracy (LA) of 70% and an unlabeled attachment score (UAS) of 84%.
arXiv Detail & Related papers (2024-06-13T19:30:32Z) - SPINDLE: Spinning Raw Text into Lambda Terms with Graph Attention [0.8379286663107844]
The module transforms raw text input to programs for meaning composition, expressed as lambda terms.
Its output consists of hi-res derivations of a multimodal type-logical grammar.
arXiv Detail & Related papers (2023-02-23T14:22:45Z) - ImPaKT: A Dataset for Open-Schema Knowledge Base Construction [10.073210304061966]
ImPaKT is a dataset for open-schema information extraction consisting of around 2500 text snippets from the C4 corpus, in the shopping domain (product buying guides)
We evaluate the power of this approach by fine-tuning the open source UL2 language model on a subset of the dataset, extracting a set of implication relations from a corpus of product buying guides, and conducting human evaluations of the resulting predictions.
arXiv Detail & Related papers (2022-12-21T05:02:49Z) - Incorporating Constituent Syntax for Coreference Resolution [50.71868417008133]
We propose a graph-based method to incorporate constituent syntactic structures.
We also explore to utilise higher-order neighbourhood information to encode rich structures in constituent trees.
Experiments on the English and Chinese portions of OntoNotes 5.0 benchmark show that our proposed model either beats a strong baseline or achieves new state-of-the-art performance.
arXiv Detail & Related papers (2022-02-22T07:40:42Z) - Grammar-Based Grounded Lexicon Learning [68.59500589319023]
G2L2 is a lexicalist approach toward learning a compositional and grounded meaning representation of language.
At the core of G2L2 is a collection of lexicon entries, which map each word to a syntactic type and a neuro-symbolic semantic program.
G2L2 can generalize from small amounts of data to novel compositions of words.
arXiv Detail & Related papers (2022-02-17T18:19:53Z) - Generalized Funnelling: Ensemble Learning and Heterogeneous Document
Embeddings for Cross-Lingual Text Classification [78.83284164605473]
emphFunnelling (Fun) is a recently proposed method for cross-lingual text classification.
We describe emphGeneralized Funnelling (gFun) as a generalization of Fun.
We show that gFun substantially improves over Fun and over state-of-the-art baselines.
arXiv Detail & Related papers (2021-09-17T23:33:04Z) - Syntactic representation learning for neural network based TTS with
syntactic parse tree traversal [49.05471750563229]
We propose a syntactic representation learning method based on syntactic parse tree to automatically utilize the syntactic structure information.
Experimental results demonstrate the effectiveness of our proposed approach.
For sentences with multiple syntactic parse trees, prosodic differences can be clearly perceived from the synthesized speeches.
arXiv Detail & Related papers (2020-12-13T05:52:07Z) - A Comparative Study on Structural and Semantic Properties of Sentence
Embeddings [77.34726150561087]
We propose a set of experiments using a widely-used large-scale data set for relation extraction.
We show that different embedding spaces have different degrees of strength for the structural and semantic properties.
These results provide useful information for developing embedding-based relation extraction methods.
arXiv Detail & Related papers (2020-09-23T15:45:32Z) - Extractive Summarization as Text Matching [123.09816729675838]
This paper creates a paradigm shift with regard to the way we build neural extractive summarization systems.
We formulate the extractive summarization task as a semantic text matching problem.
We have driven the state-of-the-art extractive result on CNN/DailyMail to a new level (44.41 in ROUGE-1)
arXiv Detail & Related papers (2020-04-19T08:27:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.