Related papers: AUTOLEX: An Automatic Framework for Linguistic Exploration

AUTOLEX: An Automatic Framework for Linguistic Exploration

URL: http://arxiv.org/abs/2203.13901v1
Date: Fri, 25 Mar 2022 20:37:30 GMT
Title: AUTOLEX: An Automatic Framework for Linguistic Exploration
Authors: Aditi Chaudhary, Zaid Sheikh, David R Mortensen, Antonios Anastasopoulos, Graham Neubig
Abstract summary: We propose an automatic framework that aims to ease linguists' discovery and extraction of concise descriptions of linguistic phenomena. Specifically, we apply this framework to extract descriptions for three phenomena: morphological agreement, case marking, and word order. We evaluate the descriptions with the help of language experts and propose a method for automated evaluation when human evaluation is infeasible.
Score: 93.89709486642666
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Each language has its own complex systems of word, phrase, and sentence construction, the guiding principles of which are often summarized in grammar descriptions for the consumption of linguists or language learners. However, manual creation of such descriptions is a fraught process, as creating descriptions which describe the language in "its own terms" without bias or error requires both a deep understanding of the language at hand and linguistics as a whole. We propose an automatic framework AutoLEX that aims to ease linguists' discovery and extraction of concise descriptions of linguistic phenomena. Specifically, we apply this framework to extract descriptions for three phenomena: morphological agreement, case marking, and word order, across several languages. We evaluate the descriptions with the help of language experts and propose a method for automated evaluation when human evaluation is infeasible.

Related papers

Towards Corpus-Grounded Agentic LLMs for Multilingual Grammatical Analysis [0.5545791216381869]
We explore how agentic large language models (LLMs) can streamline the systematic analysis of annotated corpora.<n>We introduce an agentic framework for corpus-grounded grammatical analysis that integrates concepts such as natural-language task interpretation.<n>We test the system on multilingual grammatical tasks inspired by the World Atlas of Language Structures (WALS)
arXiv Detail & Related papers (2025-11-28T21:27:58Z)
Finding Structure in Language Models [3.882018118763685]
This thesis is about whether language models possess a deep understanding of grammatical structure similar to that of humans. We will develop novel interpretability techniques that enhance our understanding of the complex nature of large-scale language models.
arXiv Detail & Related papers (2024-11-25T14:37:24Z)
Learning Language Structures through Grounding [8.437466837766895]
We consider a family of machine learning tasks that aim to learn language structures through grounding. In Part I, we consider learning syntactic parses through visual grounding. In Part II, we propose two execution-aware methods to map sentences into corresponding semantic structures. In Part III, we propose methods that learn language structures from annotations in other languages.
arXiv Detail & Related papers (2024-06-14T02:21:53Z)
Teacher Perception of Automatically Extracted Grammar Concepts for L2 Language Learning [66.79173000135717]
We apply this work to teaching two Indian languages, Kannada and Marathi, which do not have well-developed resources for second language learning. We extract descriptions from a natural text corpus that answer questions about morphosyntax (learning of word order, agreement, case marking, or word formation) and semantics (learning of vocabulary). We enlist the help of language educators from schools in North America to perform a manual evaluation, who find the materials have potential to be used for their lesson preparation and learner evaluation.
arXiv Detail & Related papers (2023-10-27T18:17:29Z)
BabySLM: language-acquisition-friendly benchmark of self-supervised spoken language models [56.93604813379634]
Self-supervised techniques for learning speech representations have been shown to develop linguistic competence from exposure to speech without the need for human labels. We propose a language-acquisition-friendly benchmark to probe spoken language models at the lexical and syntactic levels. We highlight two exciting challenges that need to be addressed for further progress: bridging the gap between text and speech and between clean speech and in-the-wild speech.
arXiv Detail & Related papers (2023-06-02T12:54:38Z)
Prompting Language Models for Linguistic Structure [73.11488464916668]
We present a structured prompting approach for linguistic structured prediction tasks. We evaluate this approach on part-of-speech tagging, named entity recognition, and sentence chunking. We find that while PLMs contain significant prior knowledge of task labels due to task leakage into the pretraining corpus, structured prompting can also retrieve linguistic structure with arbitrary labels.
arXiv Detail & Related papers (2022-11-15T01:13:39Z)
Teacher Perception of Automatically Extracted Grammar Concepts for L2 Language Learning [91.49622922938681]
We present an automatic framework that automatically discovers and visualizing descriptions of different aspects of grammar. Specifically, we extract descriptions from a natural text corpus that answer questions about morphosyntax and semantics. We apply this method for teaching the Indian languages, Kannada and Marathi, which, unlike English, do not have well-developed pedagogical resources.
arXiv Detail & Related papers (2022-06-10T14:52:22Z)
Automatic Extraction of Rules Governing Morphological Agreement [103.78033184221373]
We develop an automated framework for extracting a first-pass grammatical specification from raw text. We focus on extracting rules describing agreement, a morphosyntactic phenomenon at the core of the grammars of many of the world's languages. We apply our framework to all languages included in the Universal Dependencies project, with promising results.
arXiv Detail & Related papers (2020-10-02T18:31:45Z)
Machine learning approach of Japanese composition scoring and writing aided system's design [0.0]
A composition scoring system can greatly assist language learners. It can make language leaner improve themselves in the process of output something. Especially for foreign language learners, lexical and syntactic content are usually what they are more concerned about.
arXiv Detail & Related papers (2020-08-26T11:01:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.