Unlocking Korean Verbs: A User-Friendly Exploration into the Verb Lexicon
- URL: http://arxiv.org/abs/2410.01100v1
- Date: Tue, 1 Oct 2024 22:03:34 GMT
- Title: Unlocking Korean Verbs: A User-Friendly Exploration into the Verb Lexicon
- Authors: Seohyun Song, Eunkyul Leah Jo, Yige Chen, Jeen-Pyo Hong, Kyuwon Kim, Jin Wee, Miyoung Kang, KyungTae Lim, Jungyeul Park, Chulwoo Park,
- Abstract summary: Sejong dictionary dataset offers extensive coverage of morphology, syntax, and semantic representation.
The labeled linguistic structures within this dataset form the basis for uncovering relationships between words and phrases.
This paper introduces a user-friendly web interface designed for the collection and consolidation of verb-related information.
- Score: 5.358486800301437
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The Sejong dictionary dataset offers a valuable resource, providing extensive coverage of morphology, syntax, and semantic representation. This dataset can be utilized to explore linguistic information in greater depth. The labeled linguistic structures within this dataset form the basis for uncovering relationships between words and phrases and their associations with target verbs. This paper introduces a user-friendly web interface designed for the collection and consolidation of verb-related information, with a particular focus on subcategorization frames. Additionally, it outlines our efforts in mapping this information by aligning subcategorization frames with corresponding illustrative sentence examples. Furthermore, we provide a Python library that would simplify syntactic parsing and semantic role labeling. These tools are intended to assist individuals interested in harnessing the Sejong dictionary dataset to develop applications for Korean language processing.
Related papers
- Does Incomplete Syntax Influence Korean Language Model? Focusing on Word Order and Case Markers [7.275938266030414]
Syntactic elements, such as word order and case markers, are fundamental in natural language processing.
This study explores whether Korean language models can accurately capture this flexibility.
arXiv Detail & Related papers (2024-07-12T11:33:41Z) - Sõnajaht: Definition Embeddings and Semantic Search for Reverse Dictionary Creation [0.21485350418225246]
We present an information retrieval based reverse dictionary system using modern pre-trained language models and approximate nearest neighbors search algorithms.
The proposed approach is applied to an existing Estonian language lexicon resource, Sonaveeb (word web), with the purpose of enhancing and enriching it by introducing cross-lingual reverse dictionary functionality powered by semantic search.
arXiv Detail & Related papers (2024-04-30T10:21:14Z) - Open-Vocabulary Camouflaged Object Segmentation [66.94945066779988]
We introduce a new task, open-vocabulary camouflaged object segmentation (OVCOS)
We construct a large-scale complex scene dataset (textbfOVCamo) containing 11,483 hand-selected images with fine annotations and corresponding object classes.
By integrating the guidance of class semantic knowledge and the supplement of visual structure cues from the edge and depth information, the proposed method can efficiently capture camouflaged objects.
arXiv Detail & Related papers (2023-11-19T06:00:39Z) - SpaDeLeF: A Dataset for Hierarchical Classification of Lexical Functions
for Collocations in Spanish [6.9454683800956705]
We present a dataset of most frequent Spanish verb-noun collocations and sentences where they occur.
Each collocation is assigned to one of 37 lexical functions defined as classes for a hierarchical classification task.
We combine the classes in a tree-based structure, and introduce classification objectives for each level of the structure.
arXiv Detail & Related papers (2023-11-07T18:32:34Z) - Compositional Generalization in Grounded Language Learning via Induced
Model Sparsity [81.38804205212425]
We consider simple language-conditioned navigation problems in a grid world environment with disentangled observations.
We design an agent that encourages sparse correlations between words in the instruction and attributes of objects, composing them together to find the goal.
Our agent maintains a high level of performance on goals containing novel combinations of properties even when learning from a handful of demonstrations.
arXiv Detail & Related papers (2022-07-06T08:46:27Z) - Multilingual Extraction and Categorization of Lexical Collocations with
Graph-aware Transformers [86.64972552583941]
We put forward a sequence tagging BERT-based model enhanced with a graph-aware transformer architecture, which we evaluate on the task of collocation recognition in context.
Our results suggest that explicitly encoding syntactic dependencies in the model architecture is helpful, and provide insights on differences in collocation typification in English, Spanish and French.
arXiv Detail & Related papers (2022-05-23T16:47:37Z) - Neural Label Search for Zero-Shot Multi-Lingual Extractive Summarization [80.94424037751243]
In zero-shot multilingual extractive text summarization, a model is typically trained on English dataset and then applied on summarization datasets of other languages.
We propose NLS (Neural Label Search for Summarization), which jointly learns hierarchical weights for different sets of labels together with our summarization model.
We conduct multilingual zero-shot summarization experiments on MLSUM and WikiLingua datasets, and we achieve state-of-the-art results using both human and automatic evaluations.
arXiv Detail & Related papers (2022-04-28T14:02:16Z) - CREER: A Large-Scale Corpus for Relation Extraction and Entity
Recognition [9.54366784050374]
The CREER dataset uses the Stanford CoreNLP Annotator to capture rich language structures from Wikipedia plain text.
This dataset follows widely used linguistic and semantic annotations so that it can be used for not only most natural language processing tasks but also scaling the dataset.
arXiv Detail & Related papers (2022-04-27T05:43:21Z) - Incorporating Constituent Syntax for Coreference Resolution [50.71868417008133]
We propose a graph-based method to incorporate constituent syntactic structures.
We also explore to utilise higher-order neighbourhood information to encode rich structures in constituent trees.
Experiments on the English and Chinese portions of OntoNotes 5.0 benchmark show that our proposed model either beats a strong baseline or achieves new state-of-the-art performance.
arXiv Detail & Related papers (2022-02-22T07:40:42Z) - Learning Contextualised Cross-lingual Word Embeddings and Alignments for
Extremely Low-Resource Languages Using Parallel Corpora [63.5286019659504]
We propose a new approach for learning contextualised cross-lingual word embeddings based on a small parallel corpus.
Our method obtains word embeddings via an LSTM encoder-decoder model that simultaneously translates and reconstructs an input sentence.
arXiv Detail & Related papers (2020-10-27T22:24:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.