GrammarTagger: A Multilingual, Minimally-Supervised Grammar Profiler for
Language Education
- URL: http://arxiv.org/abs/2104.03190v1
- Date: Wed, 7 Apr 2021 15:31:20 GMT
- Title: GrammarTagger: A Multilingual, Minimally-Supervised Grammar Profiler for
Language Education
- Authors: Masato Hagiwara, Joshua Tanner, Keisuke Sakaguchi
- Abstract summary: We present GrammarTagger, an open-source grammar profiler which, given an input text, identifies grammatical features useful for language education.
The model architecture enables it to learn from a small amount of texts annotated with spans and their labels.
We also build Octanove Learn, a search engine of language learning materials indexed by their reading difficulty and grammatical features.
- Score: 7.517366022163375
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present GrammarTagger, an open-source grammar profiler which, given an
input text, identifies grammatical features useful for language education. The
model architecture enables it to learn from a small amount of texts annotated
with spans and their labels, which 1) enables easier and more intuitive
annotation, 2) supports overlapping spans, and 3) is less prone to error
propagation, compared to complex hand-crafted rules defined on
constituency/dependency parses. We show that we can bootstrap a grammar
profiler model with $F_1 \approx 0.6$ from only a couple hundred sentences both
in English and Chinese, which can be further boosted via learning a
multilingual model. With GrammarTagger, we also build Octanove Learn, a search
engine of language learning materials indexed by their reading difficulty and
grammatical features. The code and pretrained models are publicly available at
\url{https://github.com/octanove/grammartagger}.
Related papers
- Sparse Logistic Regression with High-order Features for Automatic Grammar Rule Extraction from Treebanks [6.390468088226495]
We propose a new method to extract and explore significant fine-grained grammar patterns from treebanks.
We extract descriptions and rules across different languages for two linguistic phenomena, agreement and word order.
Our method captures both well-known and less well-known significant grammar rules in Spanish, French, and Wolof.
arXiv Detail & Related papers (2024-03-26T09:39:53Z) - Multilingual BERT has an accent: Evaluating English influences on
fluency in multilingual models [23.62852626011989]
We show that grammatical structures in higher-resource languages bleed into lower-resource languages.
We show this bias via a novel method for comparing the fluency of multilingual models to the fluency of monolingual Spanish and Greek models.
arXiv Detail & Related papers (2022-10-11T17:06:38Z) - Towards Lithuanian grammatical error correction [0.0]
We construct a grammatical error correction model for Lithuanian, the language rich in archaic features.
We compare subword and byte-level approaches and share our best trained model, achieving F$_0.5$=0.92, and accompanying code, in an online open-source repository.
arXiv Detail & Related papers (2022-03-18T13:59:02Z) - Learning grammar with a divide-and-concur neural network [4.111899441919164]
We implement a divide-and-concur iterative projection approach to context-free grammar inference.
Our method requires a relatively small number of discrete parameters, making the inferred grammar directly interpretable.
arXiv Detail & Related papers (2022-01-18T22:42:43Z) - Dependency Induction Through the Lens of Visual Perception [81.91502968815746]
We propose an unsupervised grammar induction model that leverages word concreteness and a structural vision-based to jointly learn constituency-structure and dependency-structure grammars.
Our experiments show that the proposed extension outperforms the current state-of-the-art visually grounded models in constituency parsing even with a smaller grammar size.
arXiv Detail & Related papers (2021-09-20T18:40:37Z) - Lattice-BERT: Leveraging Multi-Granularity Representations in Chinese
Pre-trained Language Models [62.41139712595334]
We propose a novel pre-training paradigm for Chinese -- Lattice-BERT.
We construct a lattice graph from the characters and words in a sentence and feed all these text units into transformers.
We show that our model can bring an average increase of 1.5% under the 12-layer setting.
arXiv Detail & Related papers (2021-04-15T02:36:49Z) - VLGrammar: Grounded Grammar Induction of Vision and Language [86.88273769411428]
We study grounded grammar induction of vision and language in a joint learning framework.
We present VLGrammar, a method that uses compound probabilistic context-free grammars (compound PCFGs) to induce the language grammar and the image grammar simultaneously.
arXiv Detail & Related papers (2021-03-24T04:05:08Z) - UNKs Everywhere: Adapting Multilingual Language Models to New Scripts [103.79021395138423]
Massively multilingual language models such as multilingual BERT (mBERT) and XLM-R offer state-of-the-art cross-lingual transfer performance on a range of NLP tasks.
Due to their limited capacity and large differences in pretraining data, there is a profound performance gap between resource-rich and resource-poor target languages.
We propose novel data-efficient methods that enable quick and effective adaptation of pretrained multilingual models to such low-resource languages and unseen scripts.
arXiv Detail & Related papers (2020-12-31T11:37:28Z) - Automatic Extraction of Rules Governing Morphological Agreement [103.78033184221373]
We develop an automated framework for extracting a first-pass grammatical specification from raw text.
We focus on extracting rules describing agreement, a morphosyntactic phenomenon at the core of the grammars of many of the world's languages.
We apply our framework to all languages included in the Universal Dependencies project, with promising results.
arXiv Detail & Related papers (2020-10-02T18:31:45Z) - Making Monolingual Sentence Embeddings Multilingual using Knowledge
Distillation [73.65237422910738]
We present an easy and efficient method to extend existing sentence embedding models to new languages.
This allows to create multilingual versions from previously monolingual models.
arXiv Detail & Related papers (2020-04-21T08:20:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.