A Curious Class of Adpositional Multiword Expressions in Korean
- URL: http://arxiv.org/abs/2602.16023v1
- Date: Tue, 17 Feb 2026 21:23:16 GMT
- Title: A Curious Class of Adpositional Multiword Expressions in Korean
- Authors: Junghyun Min, Na-Rae Han, Jena D. Hwang, Nathan Schneider,
- Abstract summary: Multiword expressions (MWEs) have been widely studied in cross-lingual annotation frameworks such as PARSEME.<n>In this paper, we study a class of Korean functional multiword expressions: postpositional verb-based constructions (PVCs)
- Score: 10.449742937121014
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multiword expressions (MWEs) have been widely studied in cross-lingual annotation frameworks such as PARSEME. However, Korean MWEs remain underrepresented in these efforts. In particular, Korean multiword adpositions lack systematic analysis, annotated resources, and integration into existing multilingual frameworks. In this paper, we study a class of Korean functional multiword expressions: postpositional verb-based constructions (PVCs). Using data from Korean Wikipedia, we survey and analyze several PVC expressions and contrast them with non-MWEs and light verb constructions (LVCs) with similar structure. Building on this analysis, we propose annotation guidelines designed to support future work in Korean multiword adpositions and facilitate alignment with cross-lingual frameworks.
Related papers
- KITE: A Benchmark for Evaluating Korean Instruction-Following Abilities in Large Language Models [36.90941464587649]
We introduce the Korean Instruction-following Task Evaluation (KITE), a benchmark designed to evaluate both general and Korean-specific instructions.<n>Unlike existing Korean benchmarks that focus mainly on factual knowledge or multiple-choice testing, KITE directly targets diverse, open-ended instruction-following tasks.
arXiv Detail & Related papers (2025-10-17T11:45:15Z) - EXECUTE: A Multilingual Benchmark for LLM Token Understanding [54.70665106141121]
Tests across multiple languages reveal that challenges in other languages are not always on the character level as in English.<n>We also examine sub-character tasks in Chinese, Japanese, and Korean to assess LLMs' understanding of character components.
arXiv Detail & Related papers (2025-05-23T11:56:48Z) - Parsing Through Boundaries in Chinese Word Segmentation [5.144001661743487]
Unlike English, Chinese lacks explicit word boundaries, making segmentation both necessary and inherently ambiguous.<n>This study highlights the intricate relationship between word segmentation and syntactic parsing, providing a clearer understanding of how different segmentation strategies shape dependency structures in Chinese.
arXiv Detail & Related papers (2025-03-29T14:24:02Z) - K-UD: Revising Korean Universal Dependencies Guidelines [6.292929354303524]
We aim to refine the definition of syntactic dependency of UDs within the context of analyzing the Korean language.<n>Our aim is not only to achieve a consensus within UDs but also to garner agreement beyond the UD framework for analyzing Korean sentences using dependency structure.
arXiv Detail & Related papers (2024-12-01T15:41:05Z) - Does Incomplete Syntax Influence Korean Language Model? Focusing on Word Order and Case Markers [7.275938266030414]
Syntactic elements, such as word order and case markers, are fundamental in natural language processing.
This study explores whether Korean language models can accurately capture this flexibility.
arXiv Detail & Related papers (2024-07-12T11:33:41Z) - A General and Flexible Multi-concept Parsing Framework for Multilingual Semantic Matching [60.51839859852572]
We propose to resolve the text into multi concepts for multilingual semantic matching to liberate the model from the reliance on NER models.
We conduct comprehensive experiments on English datasets QQP and MRPC, and Chinese dataset Medical-SM.
arXiv Detail & Related papers (2024-03-05T13:55:16Z) - Word segmentation granularity in Korean [1.0619039878979954]
There are multiple possible levels of word segmentation granularity in Korean.
For specific language processing and corpus annotation tasks, several different granularity levels have been proposed and utilized.
Interestingly, the granularity by separating only functional morphemes results in the optimal performance for phrase structure parsing.
arXiv Detail & Related papers (2023-09-07T13:42:05Z) - Decomposed Prompting for Machine Translation Between Related Languages
using Large Language Models [55.35106713257871]
We introduce DecoMT, a novel approach of few-shot prompting that decomposes the translation process into a sequence of word chunk translations.
We show that DecoMT outperforms the strong few-shot prompting BLOOM model with an average improvement of 8 chrF++ scores across the examined languages.
arXiv Detail & Related papers (2023-05-22T14:52:47Z) - Multilingual Word Sense Disambiguation with Unified Sense Representation [55.3061179361177]
We propose building knowledge and supervised-based Multilingual Word Sense Disambiguation (MWSD) systems.
We build unified sense representations for multiple languages and address the annotation scarcity problem for MWSD by transferring annotations from rich-sourced languages to poorer ones.
Evaluations of SemEval-13 and SemEval-15 datasets demonstrate the effectiveness of our methodology.
arXiv Detail & Related papers (2022-10-14T01:24:03Z) - Learning Contextualised Cross-lingual Word Embeddings and Alignments for
Extremely Low-Resource Languages Using Parallel Corpora [63.5286019659504]
We propose a new approach for learning contextualised cross-lingual word embeddings based on a small parallel corpus.
Our method obtains word embeddings via an LSTM encoder-decoder model that simultaneously translates and reconstructs an input sentence.
arXiv Detail & Related papers (2020-10-27T22:24:01Z) - Multi-SimLex: A Large-Scale Evaluation of Multilingual and Cross-Lingual
Lexical Semantic Similarity [67.36239720463657]
Multi-SimLex is a large-scale lexical resource and evaluation benchmark covering datasets for 12 diverse languages.
Each language dataset is annotated for the lexical relation of semantic similarity and contains 1,888 semantically aligned concept pairs.
Owing to the alignment of concepts across languages, we provide a suite of 66 cross-lingual semantic similarity datasets.
arXiv Detail & Related papers (2020-03-10T17:17:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.