Fugu-MT 論文翻訳(概要): Optimizing Korean-Centric LLMs via Token Pruning

論文の概要: Optimizing Korean-Centric LLMs via Token Pruning

arxiv url: http://arxiv.org/abs/2604.16235v1
Date: Fri, 17 Apr 2026 16:53:14 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-20 22:00:20.012265
Title: Optimizing Korean-Centric LLMs via Token Pruning
Title（参考訳）: Token Pruning による韓国・中国LLMの最適化
Authors: Hoyeol Kim, Hyeonwoo Kim,
Abstract要約: トークンプルーニング(英: token pruning)は、ターゲットアプリケーションとは無関係な言語に対応するトークンとパラメータを埋め込む圧縮技術である。 Qwen3, Gemma-3, Llama-3, Ayaなどのアーキテクチャを3つの語彙構成で評価した。
参考スコア（独自算出の注目度）: 6.029880646740327
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper presents a systematic benchmark of state-of-the-art multilingual large language models (LLMs) adapted via token pruning - a compression technique that eliminates tokens and embedding parameters corresponding to languages irrelevant to the target application. Focusing on Korean-centric natural language processing (NLP) tasks, we evaluate architectures including Qwen3, Gemma-3, Llama-3, and Aya across three vocabulary configurations: Original, English-Korean (EnKo), and English-Korean-Chinese (EnKoZh). Performance is assessed using established benchmarks for general aptitude, cultural literacy, instruction following, and machine translation. Our findings indicate that token pruning significantly improves generation stability by eliminating language confusion, and in the case of machine translation, frequently enhances performance on Korean-specific tasks. While instruction-following capabilities display architecture-dependent variance linked to latent cross-lingual representations, the significant reduction in vocabulary size validates token pruning as a highly effective optimization strategy for memory-constrained, domain-specific deployments, despite modest gains in inference latency.
Abstract（参考訳）: 本稿では,トークンプルーニングにより適応した多言語多言語言語モデル (LLM) の体系的ベンチマークについて述べる。韓国中心の自然言語処理(NLP)タスクに着目し,Qwen3,Gemma-3,Llama-3,Ayaなどのアーキテクチャを,原語,英語・韓国語(EnKo),英語・韓国語(EnKoZh)の3つの語彙構成で評価した。性能は、一般的な適性、文化的リテラシー、指示に従うこと、機械翻訳のための確立されたベンチマークを用いて評価される。本研究は,トークンプルーニングが言語混乱を解消し,生成安定性を著しく向上することを示し,機械翻訳の場合,韓国固有のタスクの性能向上が頻繁に行われている。命令追従機能は、潜在言語間表現に関連付けられたアーキテクチャ依存の分散を示すが、語彙サイズの大幅な削減は、推論レイテンシがわずかに向上したにもかかわらず、メモリ制限されたドメイン固有のデプロイメントに対して、非常に効果的な最適化戦略としてトークンプルーニングを検証する。

論文の概要: Optimizing Korean-Centric LLMs via Token Pruning

関連論文リスト