Spanish Resource Grammar version 2023
- URL: http://arxiv.org/abs/2309.13318v2
- Date: Tue, 26 Mar 2024 11:26:04 GMT
- Title: Spanish Resource Grammar version 2023
- Authors: Olga Zamaraeva, Lorena S. Allegue, Carlos Gómez-Rodríguez,
- Abstract summary: We present the latest version of the Spanish Resource Grammar (SRG)
Such grammars encode a complex set of hypotheses about syntax making them a resource for empirical testing of linguistic theory.
This version of the SRG uses the recent version of the Freeling morphological and is released along with an automatically created, manually verified treebank of 2,291 sentences.
- Score: 12.009437358109407
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present the latest version of the Spanish Resource Grammar (SRG), a grammar of Spanish implemented in the HPSG formalism. Such grammars encode a complex set of hypotheses about syntax making them a resource for empirical testing of linguistic theory. They also encode a strict notion of grammaticality which makes them a resource for natural language processing applications in computer-assisted language learning. This version of the SRG uses the recent version of the Freeling morphological analyzer and is released along with an automatically created, manually verified treebank of 2,291 sentences. We explain the treebanking process, emphasizing how it is different from treebanking with manual annotation and how it contributes to empirically-driven development of syntactic theory. The treebanks' high level of consistency and detail makes them a resource for training high-quality semantic parsers and generally systems that benefit from precise and detailed semantics. Finally, we present the grammar's coverage and overgeneration on 100 sentences from a learner corpus, a new research line related to developing methodologies for robust empirical evaluation of hypotheses in second language acquisition.
Related papers
- Predictability and Causality in Spanish and English Natural Language Generation [6.817247544942709]
This paper compares causal and non-causal language modeling for English and Spanish.
According to this experiment, Spanish is more predictable than English given a non-causal context.
These insights support further research in NLG in Spanish using bidirectional transformer language models.
arXiv Detail & Related papers (2024-08-26T14:09:28Z) - CLSE: Corpus of Linguistically Significant Entities [58.29901964387952]
We release a Corpus of Linguistically Significant Entities (CLSE) annotated by experts.
CLSE covers 74 different semantic types to support various applications from airline ticketing to video games.
We create a linguistically representative NLG evaluation benchmark in three languages: French, Marathi, and Russian.
arXiv Detail & Related papers (2022-11-04T12:56:12Z) - AUTOLEX: An Automatic Framework for Linguistic Exploration [93.89709486642666]
We propose an automatic framework that aims to ease linguists' discovery and extraction of concise descriptions of linguistic phenomena.
Specifically, we apply this framework to extract descriptions for three phenomena: morphological agreement, case marking, and word order.
We evaluate the descriptions with the help of language experts and propose a method for automated evaluation when human evaluation is infeasible.
arXiv Detail & Related papers (2022-03-25T20:37:30Z) - Rule Augmented Unsupervised Constituency Parsing [11.775897250472116]
We propose an approach that utilizes very generic linguistic knowledge of the language present in the form of syntactic rules.
We achieve new state-of-the-art results on two benchmarks datasets, MNLI and WSJ.
arXiv Detail & Related papers (2021-05-21T08:06:11Z) - Zero-Shot Cross-lingual Semantic Parsing [56.95036511882921]
We study cross-lingual semantic parsing as a zero-shot problem without parallel data for 7 test languages.
We propose a multi-task encoder-decoder model to transfer parsing knowledge to additional languages using only English-Logical form paired data.
Our system frames zero-shot parsing as a latent-space alignment problem and finds that pre-trained models can be improved to generate logical forms with minimal cross-lingual transfer penalty.
arXiv Detail & Related papers (2021-04-15T16:08:43Z) - Unsupervised Learning of Explainable Parse Trees for Improved
Generalisation [15.576061447736057]
We propose an attention mechanism over Tree-LSTMs to learn more meaningful and explainable parse tree structures.
We also demonstrate the superior performance of our proposed model on natural language inference, semantic relatedness, and sentiment analysis tasks.
arXiv Detail & Related papers (2021-04-11T12:10:03Z) - Infusing Finetuning with Semantic Dependencies [62.37697048781823]
We show that, unlike syntax, semantics is not brought to the surface by today's pretrained models.
We then use convolutional graph encoders to explicitly incorporate semantic parses into task-specific finetuning.
arXiv Detail & Related papers (2020-12-10T01:27:24Z) - Automatic Extraction of Rules Governing Morphological Agreement [103.78033184221373]
We develop an automated framework for extracting a first-pass grammatical specification from raw text.
We focus on extracting rules describing agreement, a morphosyntactic phenomenon at the core of the grammars of many of the world's languages.
We apply our framework to all languages included in the Universal Dependencies project, with promising results.
arXiv Detail & Related papers (2020-10-02T18:31:45Z) - Constructing a Family Tree of Ten Indo-European Languages with
Delexicalized Cross-linguistic Transfer Patterns [57.86480614673034]
We formalize the delexicalized transfer as interpretable tree-to-string and tree-to-tree patterns.
This allows us to quantitatively probe cross-linguistic transfer and extend inquiries of Second Language Acquisition.
arXiv Detail & Related papers (2020-07-17T15:56:54Z) - LSCP: Enhanced Large Scale Colloquial Persian Language Understanding [2.7249643773851724]
"Large Scale Colloquial Persian dataset" aims to describe the colloquial language of low-resourced languages.
The proposed corpus consists of 120M sentences resulted from 27M tweets annotated with parsing tree, part-of-speech tags, sentiment polarity and translation in five different languages.
arXiv Detail & Related papers (2020-03-13T22:24:14Z) - A Hybrid Approach to Dependency Parsing: Combining Rules and Morphology
with Deep Learning [0.0]
We propose two approaches to dependency parsing especially for languages with restricted amount of training data.
Our first approach combines a state-of-the-art deep learning-based with a rule-based approach and the second one incorporates morphological information into the network.
The proposed methods are developed for Turkish, but can be adapted to other languages as well.
arXiv Detail & Related papers (2020-02-24T08:34:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.