UKTA: Unified Korean Text Analyzer
- URL: http://arxiv.org/abs/2502.09648v1
- Date: Tue, 11 Feb 2025 13:30:56 GMT
- Title: UKTA: Unified Korean Text Analyzer
- Authors: Seokho Ahn, Junhyung Park, Ganghee Go, Chulhui Kim, Jiho Jung, Myung Sun Shin, Do-Guk Kim, Young-Duk Seo,
- Abstract summary: UKTA (Unified Korean Text Analyzer) is a comprehensive Korea text analysis and writing evaluation system.<n>UKTA provides accurate low-level morpheme analysis, key lexical features for mid-level explainability, and transparent high-level rubric-based writing scores.
- Score: 7.342330109393445
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Evaluating writing quality is complex and time-consuming often delaying feedback to learners. While automated writing evaluation tools are effective for English, Korean automated writing evaluation tools face challenges due to their inability to address multi-view analysis, error propagation, and evaluation explainability. To overcome these challenges, we introduce UKTA (Unified Korean Text Analyzer), a comprehensive Korea text analysis and writing evaluation system. UKTA provides accurate low-level morpheme analysis, key lexical features for mid-level explainability, and transparent high-level rubric-based writing scores. Our approach enhances accuracy and quadratic weighted kappa over existing baseline, positioning UKTA as a leading multi-perspective tool for Korean text analysis and writing evaluation.
Related papers
- Evaluating Menu OCR and Translation: A Benchmark for Aligning Human and Automated Evaluations in Large Vision-Language Models [44.159383734605456]
We propose a specialized evaluation framework emphasizing the pivotal role of menu translation in cross-cultural communication.
MOTBench requires LVLMs to accurately recognize and translate each dish, along with its price and unit items on a menu, along with precise human annotations.
Our benchmark is comprised of a collection of Chinese and English menus, characterized by intricate layouts, a variety of fonts, and culturally specific elements across different languages, along with precise human annotations.
arXiv Detail & Related papers (2025-04-16T03:08:57Z) - Do LLMs Understand Your Translations? Evaluating Paragraph-level MT with Question Answering [68.3400058037817]
We introduce TREQA (Translation Evaluation via Question-Answering), a framework that extrinsically evaluates translation quality.
We show that TREQA is competitive with and, in some cases, outperforms state-of-the-art neural and LLM-based metrics in ranking alternative paragraph-level translations.
arXiv Detail & Related papers (2025-04-10T09:24:54Z) - Learning to Align Multi-Faceted Evaluation: A Unified and Robust Framework [61.38174427966444]
Large Language Models (LLMs) are being used more and more extensively for automated evaluation in various scenarios.
Previous studies have attempted to fine-tune open-source LLMs to replicate the evaluation explanations and judgments of powerful proprietary models.
We propose a novel evaluation framework, ARJudge, that adaptively formulates evaluation criteria and synthesizes both text-based and code-driven analyses.
arXiv Detail & Related papers (2025-02-26T06:31:45Z) - LLM-as-a-Judge & Reward Model: What They Can and Cannot Do [2.2469442203227863]
We conduct a comprehensive analysis of automated evaluators, reporting several key findings on their behavior.
We discover that English evaluation capabilities significantly influence language-specific evaluation capabilities, enabling evaluators trained in English to easily transfer their skills to other languages.
We find that state-of-the-art evaluators struggle with challenging prompts, in either English or Korean, underscoring their limitations in assessing or generating complex reasoning questions.
arXiv Detail & Related papers (2024-09-17T14:40:02Z) - Can Language Models Evaluate Human Written Text? Case Study on Korean Student Writing for Education [1.6340559025561785]
Large language model (LLM)-based evaluation pipelines have demonstrated their capability to robustly evaluate machine-generated text.
We investigate whether LLMs can effectively assess human-written text for educational purposes.
arXiv Detail & Related papers (2024-07-24T06:02:57Z) - Attribute Structuring Improves LLM-Based Evaluation of Clinical Text Summaries [56.31117605097345]
Large language models (LLMs) have shown the potential to generate accurate clinical text summaries, but still struggle with issues regarding grounding and evaluation.<n>Here, we explore a general mitigation framework using Attribute Structuring (AS), which structures the summary evaluation process.<n>AS consistently improves the correspondence between human annotations and automated metrics in clinical text summarization.
arXiv Detail & Related papers (2024-03-01T21:59:03Z) - Neural Automated Writing Evaluation with Corrective Feedback [4.0230668961961085]
We propose an integrated system for automated writing evaluation with corrective feedback.
This system enables language learners to simulate the essay writing tests.
It would also alleviate the burden of manually correcting innumerable essays.
arXiv Detail & Related papers (2024-02-27T15:42:33Z) - A Literature Review of Literature Reviews in Pattern Analysis and Machine Intelligence [55.33653554387953]
Pattern Analysis and Machine Intelligence (PAMI) has led to numerous literature reviews aimed at collecting and fragmented information.<n>This paper presents a thorough analysis of these literature reviews within the PAMI field.<n>We try to address three core research questions: (1) What are the prevalent structural and statistical characteristics of PAMI literature reviews; (2) What strategies can researchers employ to efficiently navigate the growing corpus of reviews; and (3) What are the advantages and limitations of AI-generated reviews compared to human-authored ones.
arXiv Detail & Related papers (2024-02-20T11:28:50Z) - Disco-Bench: A Discourse-Aware Evaluation Benchmark for Language
Modelling [70.23876429382969]
We propose a benchmark that can evaluate intra-sentence discourse properties across a diverse set of NLP tasks.
Disco-Bench consists of 9 document-level testsets in the literature domain, which contain rich discourse phenomena.
For linguistic analysis, we also design a diagnostic test suite that can examine whether the target models learn discourse knowledge.
arXiv Detail & Related papers (2023-07-16T15:18:25Z) - Error Analysis Prompting Enables Human-Like Translation Evaluation in Large Language Models [57.80514758695275]
Using large language models (LLMs) for assessing the quality of machine translation (MT) achieves state-of-the-art performance at the system level.
We propose a new prompting method called textbftextttError Analysis Prompting (EAPrompt)
This technique emulates the commonly accepted human evaluation framework - Multidimensional Quality Metrics (MQM) and textitproduces explainable and reliable MT evaluations at both the system and segment level.
arXiv Detail & Related papers (2023-03-24T05:05:03Z) - Curious Case of Language Generation Evaluation Metrics: A Cautionary
Tale [52.663117551150954]
A few popular metrics remain as the de facto metrics to evaluate tasks such as image captioning and machine translation.
This is partly due to ease of use, and partly because researchers expect to see them and know how to interpret them.
In this paper, we urge the community for more careful consideration of how they automatically evaluate their models.
arXiv Detail & Related papers (2020-10-26T13:57:20Z) - LXPER Index: a curriculum-specific text readability assessment model for
EFL students in Korea [0.5076419064097734]
LXPER Index is a readability assessment model for non-native English readers in the ELT curriculum of Korea.
Our new model, trained with CoKEC-text, significantly improves the accuracy of automatic readability assessment for texts in the Korean ELT curriculum.
arXiv Detail & Related papers (2020-08-01T11:55:03Z) - Multilingual Alignment of Contextual Word Representations [49.42244463346612]
BERT exhibits significantly improved zero-shot performance on XNLI compared to the base model.
We introduce a contextual version of word retrieval and show that it correlates well with downstream zero-shot transfer.
These results support contextual alignment as a useful concept for understanding large multilingual pre-trained models.
arXiv Detail & Related papers (2020-02-10T03:27:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.