When is Wall a Pared and when a Muro? -- Extracting Rules Governing
  Lexical Selection
        - URL: http://arxiv.org/abs/2109.06014v1
- Date: Mon, 13 Sep 2021 14:49:00 GMT
- Title: When is Wall a Pared and when a Muro? -- Extracting Rules Governing
  Lexical Selection
- Authors: Aditi Chaudhary, Kayo Yin, Antonios Anastasopoulos, Graham Neubig
- Abstract summary: We present a method for automatically identifying fine-grained lexical distinctions.
We extract concise descriptions explaining these distinctions in a human- and machine-readable format.
We use these descriptions to teach non-native speakers when to translate a given ambiguous word into its different possible translations.
- Score: 85.0262994506624
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract:   Learning fine-grained distinctions between vocabulary items is a key
challenge in learning a new language. For example, the noun "wall" has
different lexical manifestations in Spanish -- "pared" refers to an indoor wall
while "muro" refers to an outside wall. However, this variety of lexical
distinction may not be obvious to non-native learners unless the distinction is
explained in such a way. In this work, we present a method for automatically
identifying fine-grained lexical distinctions, and extracting concise
descriptions explaining these distinctions in a human- and machine-readable
format. We confirm the quality of these extracted descriptions in a language
learning setup for two languages, Spanish and Greek, where we use them to teach
non-native speakers when to translate a given ambiguous word into its different
possible translations. Code and data are publicly released here
(https://github.com/Aditi138/LexSelection)
 
      
        Related papers
        - Using Information Theory to Characterize Prosodic Typology: The Case of   Tone, Pitch-Accent and Stress-Accent [22.63155507847401]
 We predict that languages that use prosody to make lexical distinctions should exhibit a higher mutual information between word identity and prosody, compared to languages that don't.<n>We use a dataset of speakers reading sentences aloud in ten languages across five language families to estimate the mutual information between the text and their pitch curves.
 arXiv  Detail & Related papers  (2025-05-12T15:25:17Z)
- Multilingual LLMs Struggle to Link Orthography and Semantics in   Bilingual Word Processing [19.6191088446367]
 This study focuses on English-Spanish, English-French, and English-German cognates, non-cognate, and interlingual homographs.
We evaluate how multilingual Large Language Models (LLMs) handle such phenomena, focusing on English-Spanish, English-French, and English-German cognates, non-cognate, and interlingual homographs.
We find models to opt for different strategies in understanding English and non-English homographs, highlighting a lack of a unified approach to handling cross-lingual ambiguities.
 arXiv  Detail & Related papers  (2025-01-15T20:22:35Z)
- Crowdsourcing Lexical Diversity [7.569845058082537]
 This paper proposes a novel crowdsourcing methodology for reducing bias in lexicons.
Crowd workers compare lexemes from two languages, focusing on domains rich in lexical diversity, such as kinship or food.
We validated our method by applying it to two case studies focused on food-related terminology.
 arXiv  Detail & Related papers  (2024-10-30T15:45:09Z)
- What an Elegant Bridge: Multilingual LLMs are Biased Similarly in   Different Languages [51.0349882045866]
 This paper investigates biases of Large Language Models (LLMs) through the lens of grammatical gender.
We prompt a model to describe nouns with adjectives in various languages, focusing specifically on languages with grammatical gender.
We find that a simple classifier can not only predict noun gender above chance but also exhibit cross-language transferability.
 arXiv  Detail & Related papers  (2024-07-12T22:10:16Z)
- Testing learning hypotheses using neural networks by manipulating   learning data [20.525923251193472]
 We show that a neural network language model can learn restrictions to the passive that are similar to those displayed by humans.
We find that while the frequency with which a verb appears in the passive significantly affects its passivizability, the semantics of the verb does not.
 arXiv  Detail & Related papers  (2024-07-05T15:41:30Z)
- Teacher Perception of Automatically Extracted Grammar Concepts for L2
  Language Learning [91.49622922938681]
 We present an automatic framework that automatically discovers and visualizing descriptions of different aspects of grammar.
Specifically, we extract descriptions from a natural text corpus that answer questions about morphosyntax and semantics.
We apply this method for teaching the Indian languages, Kannada and Marathi, which, unlike English, do not have well-developed pedagogical resources.
 arXiv  Detail & Related papers  (2022-06-10T14:52:22Z)
- UAlberta at SemEval 2022 Task 2: Leveraging Glosses and Translations for
  Multilingual Idiomaticity Detection [4.66831886752751]
 We describe the University of Alberta systems for the SemEval-2022 Task 2 on multilingual idiomaticity detection.
Under the assumption that idiomatic expressions are noncompositional, our first method integrates information on the meanings of the individual words of an expression into a binary classifier.
Our second method translates an expression in context, and uses a lexical knowledge base to determine if the translation is literal.
 arXiv  Detail & Related papers  (2022-05-27T16:35:00Z)
- AUTOLEX: An Automatic Framework for Linguistic Exploration [93.89709486642666]
 We propose an automatic framework that aims to ease linguists' discovery and extraction of concise descriptions of linguistic phenomena.
Specifically, we apply this framework to extract descriptions for three phenomena: morphological agreement, case marking, and word order.
We evaluate the descriptions with the help of language experts and propose a method for automated evaluation when human evaluation is infeasible.
 arXiv  Detail & Related papers  (2022-03-25T20:37:30Z)
- Exploring the Representation of Word Meanings in Context: A Case Study
  on Homonymy and Synonymy [0.0]
 We assess the ability of both static and contextualized models to adequately represent different lexical-semantic relations.
Experiments are performed in Galician, Portuguese, English, and Spanish.
 arXiv  Detail & Related papers  (2021-06-25T10:54:23Z)
- Probing Pretrained Language Models for Lexical Semantics [76.73599166020307]
 We present a systematic empirical analysis across six typologically diverse languages and five different lexical tasks.
Our results indicate patterns and best practices that hold universally, but also point to prominent variations across languages and tasks.
 arXiv  Detail & Related papers  (2020-10-12T14:24:01Z)
- Speakers Fill Lexical Semantic Gaps with Context [65.08205006886591]
 We operationalise the lexical ambiguity of a word as the entropy of meanings it can take.
We find significant correlations between our estimate of ambiguity and the number of synonyms a word has in WordNet.
This suggests that, in the presence of ambiguity, speakers compensate by making contexts more informative.
 arXiv  Detail & Related papers  (2020-10-05T17:19:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.