AUTOLEX: An Automatic Framework for Linguistic Exploration
- URL: http://arxiv.org/abs/2203.13901v1
- Date: Fri, 25 Mar 2022 20:37:30 GMT
- Title: AUTOLEX: An Automatic Framework for Linguistic Exploration
- Authors: Aditi Chaudhary, Zaid Sheikh, David R Mortensen, Antonios
Anastasopoulos, Graham Neubig
- Abstract summary: We propose an automatic framework that aims to ease linguists' discovery and extraction of concise descriptions of linguistic phenomena.
Specifically, we apply this framework to extract descriptions for three phenomena: morphological agreement, case marking, and word order.
We evaluate the descriptions with the help of language experts and propose a method for automated evaluation when human evaluation is infeasible.
- Score: 93.89709486642666
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Each language has its own complex systems of word, phrase, and sentence
construction, the guiding principles of which are often summarized in grammar
descriptions for the consumption of linguists or language learners. However,
manual creation of such descriptions is a fraught process, as creating
descriptions which describe the language in "its own terms" without bias or
error requires both a deep understanding of the language at hand and
linguistics as a whole. We propose an automatic framework AutoLEX that aims to
ease linguists' discovery and extraction of concise descriptions of linguistic
phenomena. Specifically, we apply this framework to extract descriptions for
three phenomena: morphological agreement, case marking, and word order, across
several languages. We evaluate the descriptions with the help of language
experts and propose a method for automated evaluation when human evaluation is
infeasible.
Related papers
- Finding Structure in Language Models [3.882018118763685]
This thesis is about whether language models possess a deep understanding of grammatical structure similar to that of humans.
We will develop novel interpretability techniques that enhance our understanding of the complex nature of large-scale language models.
arXiv Detail & Related papers (2024-11-25T14:37:24Z) - Learning Language Structures through Grounding [8.437466837766895]
We consider a family of machine learning tasks that aim to learn language structures through grounding.
In Part I, we consider learning syntactic parses through visual grounding.
In Part II, we propose two execution-aware methods to map sentences into corresponding semantic structures.
In Part III, we propose methods that learn language structures from annotations in other languages.
arXiv Detail & Related papers (2024-06-14T02:21:53Z) - Teacher Perception of Automatically Extracted Grammar Concepts for L2
Language Learning [66.79173000135717]
We apply this work to teaching two Indian languages, Kannada and Marathi, which do not have well-developed resources for second language learning.
We extract descriptions from a natural text corpus that answer questions about morphosyntax (learning of word order, agreement, case marking, or word formation) and semantics (learning of vocabulary).
We enlist the help of language educators from schools in North America to perform a manual evaluation, who find the materials have potential to be used for their lesson preparation and learner evaluation.
arXiv Detail & Related papers (2023-10-27T18:17:29Z) - BabySLM: language-acquisition-friendly benchmark of self-supervised
spoken language models [56.93604813379634]
Self-supervised techniques for learning speech representations have been shown to develop linguistic competence from exposure to speech without the need for human labels.
We propose a language-acquisition-friendly benchmark to probe spoken language models at the lexical and syntactic levels.
We highlight two exciting challenges that need to be addressed for further progress: bridging the gap between text and speech and between clean speech and in-the-wild speech.
arXiv Detail & Related papers (2023-06-02T12:54:38Z) - Prompting Language Models for Linguistic Structure [73.11488464916668]
We present a structured prompting approach for linguistic structured prediction tasks.
We evaluate this approach on part-of-speech tagging, named entity recognition, and sentence chunking.
We find that while PLMs contain significant prior knowledge of task labels due to task leakage into the pretraining corpus, structured prompting can also retrieve linguistic structure with arbitrary labels.
arXiv Detail & Related papers (2022-11-15T01:13:39Z) - Teacher Perception of Automatically Extracted Grammar Concepts for L2
Language Learning [91.49622922938681]
We present an automatic framework that automatically discovers and visualizing descriptions of different aspects of grammar.
Specifically, we extract descriptions from a natural text corpus that answer questions about morphosyntax and semantics.
We apply this method for teaching the Indian languages, Kannada and Marathi, which, unlike English, do not have well-developed pedagogical resources.
arXiv Detail & Related papers (2022-06-10T14:52:22Z) - Automatic Extraction of Rules Governing Morphological Agreement [103.78033184221373]
We develop an automated framework for extracting a first-pass grammatical specification from raw text.
We focus on extracting rules describing agreement, a morphosyntactic phenomenon at the core of the grammars of many of the world's languages.
We apply our framework to all languages included in the Universal Dependencies project, with promising results.
arXiv Detail & Related papers (2020-10-02T18:31:45Z) - Machine learning approach of Japanese composition scoring and writing
aided system's design [0.0]
A composition scoring system can greatly assist language learners.
It can make language leaner improve themselves in the process of output something.
Especially for foreign language learners, lexical and syntactic content are usually what they are more concerned about.
arXiv Detail & Related papers (2020-08-26T11:01:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.