Automatic Extraction of Rules Governing Morphological Agreement
- URL: http://arxiv.org/abs/2010.01160v2
- Date: Tue, 6 Oct 2020 03:30:27 GMT
- Title: Automatic Extraction of Rules Governing Morphological Agreement
- Authors: Aditi Chaudhary, Antonios Anastasopoulos, Adithya Pratapa, David R.
Mortensen, Zaid Sheikh, Yulia Tsvetkov, Graham Neubig
- Abstract summary: We develop an automated framework for extracting a first-pass grammatical specification from raw text.
We focus on extracting rules describing agreement, a morphosyntactic phenomenon at the core of the grammars of many of the world's languages.
We apply our framework to all languages included in the Universal Dependencies project, with promising results.
- Score: 103.78033184221373
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Creating a descriptive grammar of a language is an indispensable step for
language documentation and preservation. However, at the same time it is a
tedious, time-consuming task. In this paper, we take steps towards automating
this process by devising an automated framework for extracting a first-pass
grammatical specification from raw text in a concise, human- and
machine-readable format. We focus on extracting rules describing agreement, a
morphosyntactic phenomenon at the core of the grammars of many of the world's
languages. We apply our framework to all languages included in the Universal
Dependencies project, with promising results. Using cross-lingual transfer,
even with no expert annotations in the language of interest, our framework
extracts a grammatical specification which is nearly equivalent to those
created with large amounts of gold-standard annotated data. We confirm this
finding with human expert evaluations of the rules that our framework produces,
which have an average accuracy of 78%. We release an interface demonstrating
the extracted rules at https://neulab.github.io/lase/.
Related papers
- Sparse Logistic Regression with High-order Features for Automatic Grammar Rule Extraction from Treebanks [6.390468088226495]
We propose a new method to extract and explore significant fine-grained grammar patterns from treebanks.
We extract descriptions and rules across different languages for two linguistic phenomena, agreement and word order.
Our method captures both well-known and less well-known significant grammar rules in Spanish, French, and Wolof.
arXiv Detail & Related papers (2024-03-26T09:39:53Z) - Wav2Gloss: Generating Interlinear Glossed Text from Speech [78.64412090339044]
We propose Wav2Gloss, a task in which four linguistic annotation components are extracted automatically from speech.
We provide various baselines to lay the groundwork for future research on Interlinear Glossed Text generation from speech.
arXiv Detail & Related papers (2024-03-19T21:45:29Z) - nl2spec: Interactively Translating Unstructured Natural Language to
Temporal Logics with Large Language Models [3.1143846686797314]
We present nl2spec, a framework for applying Large Language Models (LLMs) derive formal specifications from unstructured natural language.
We introduce a new methodology to detect and resolve the inherent ambiguity of system requirements in natural language.
Users iteratively add, delete, and edit these sub-translations to amend erroneous formalizations, which is easier than manually redrafting the entire formalization.
arXiv Detail & Related papers (2023-03-08T20:08:53Z) - AUTOLEX: An Automatic Framework for Linguistic Exploration [93.89709486642666]
We propose an automatic framework that aims to ease linguists' discovery and extraction of concise descriptions of linguistic phenomena.
Specifically, we apply this framework to extract descriptions for three phenomena: morphological agreement, case marking, and word order.
We evaluate the descriptions with the help of language experts and propose a method for automated evaluation when human evaluation is infeasible.
arXiv Detail & Related papers (2022-03-25T20:37:30Z) - Evaluating the Morphosyntactic Well-formedness of Generated Texts [88.20502652494521]
We propose L'AMBRE -- a metric to evaluate the morphosyntactic well-formedness of text.
We show the effectiveness of our metric on the task of machine translation through a diachronic study of systems translating into morphologically-rich languages.
arXiv Detail & Related papers (2021-03-30T18:02:58Z) - Lexically-constrained Text Generation through Commonsense Knowledge
Extraction and Injection [62.071938098215085]
We focus on the Commongen benchmark, wherein the aim is to generate a plausible sentence for a given set of input concepts.
We propose strategies for enhancing the semantic correctness of the generated text.
arXiv Detail & Related papers (2020-12-19T23:23:40Z) - FILTER: An Enhanced Fusion Method for Cross-lingual Language
Understanding [85.29270319872597]
We propose an enhanced fusion method that takes cross-lingual data as input for XLM finetuning.
During inference, the model makes predictions based on the text input in the target language and its translation in the source language.
To tackle this issue, we propose an additional KL-divergence self-teaching loss for model training, based on auto-generated soft pseudo-labels for translated text in the target language.
arXiv Detail & Related papers (2020-09-10T22:42:15Z) - Machine learning approach of Japanese composition scoring and writing
aided system's design [0.0]
A composition scoring system can greatly assist language learners.
It can make language leaner improve themselves in the process of output something.
Especially for foreign language learners, lexical and syntactic content are usually what they are more concerned about.
arXiv Detail & Related papers (2020-08-26T11:01:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.