Fugu-MT 論文翻訳(概要): The BD-LSC Dataset: Facilitating the Benchmarking of Models for Lexical Semantic Change Detection in Slang and Standard Usage

論文の概要: The BD-LSC Dataset: Facilitating the Benchmarking of Models for Lexical Semantic Change Detection in Slang and Standard Usage

arxiv url: http://arxiv.org/abs/2606.16560v1
Date: Mon, 15 Jun 2026 11:03:39 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-16 16:21:34.469224
Title: The BD-LSC Dataset: Facilitating the Benchmarking of Models for Lexical Semantic Change Detection in Slang and Standard Usage
Title（参考訳）: BD-LSCデータセット:スラングと標準使用法における語彙意味変化検出のためのモデルのベンチマーク化
Authors: Afnan Aloraini, Viktor Schlegel, Goran Nenadic, Riza Batista-Navarro,
Abstract要約: 補足ベンチマークデータセットを2つ導入する。 BD-LSCデータセットは、3つの期間にわたってセンスゲイン、センスロス、安定性をキャプチャする。 ST-WSDデータセットは、スラングと標準使用法を組み合わせた単語に対して、きめ細かいインスタンスレベルのセンスアノテーションを提供する。
参考スコア（独自算出の注目度）: 20.764165408679762
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Automatic semantic change detection aims to identify how word meanings shift over time, offering insights into both linguistic and societal change. Despite recent progress in computational lexical semantic change (LSC), existing benchmarks and methods struggle to capture bi-directional semantic change, particularly cases where words simultaneously gain and lose senses. This problem is especially challenging for words that have both slang and standard meanings. To address these gaps, we introduce two complementary benchmark datasets. The Bi-Directional Lexical Semantic Change (BD-LSC) dataset captures sense gain, sense loss, and stability across three time periods, enabling the study of complex semantic trajectories. The SlangTrack Word Sense Disambiguation (ST-WSD) dataset provides fine-grained, instance-level sense annotations for words combining slang and standard usages, supporting systematic benchmarking of WSD and semantic change detection models. Using these benchmarks, we systematically evaluate models across different methodological families: unsupervised clustering using contextualised embeddings, supervised machine learning, transformer-based models, and state-of-the-art large language models. Among the evaluated systems, the few-shot GPT-4o model achieved the strongest aggregate performance on Exact Sense Match (ESM) and multi-label accuracy; however, Macro-F1 scores near 0.5 across all systems show that rare slang senses remain difficult, which we identify as the central open challenge.
Abstract（参考訳）: 自動意味変化検出は、言葉の意味が時間とともにどのように変化するかを特定することを目的としており、言語的および社会的変化の両方に関する洞察を提供する。近年のLSC(Computer lexical semantic change)の進展にもかかわらず、既存のベンチマークや手法は双方向のセマンティックな変化を捉えるのに苦労している。この問題は、スラングと標準の意味の両方を持つ単語にとって特に困難である。これらのギャップに対処するため、2つの相補的なベンチマークデータセットを導入します。 Bi-Directional Lexical Semantic Change (BD-LSC)データセットは、3つの期間にわたってセンスゲイン、センスロス、安定性をキャプチャし、複雑な意味軌道の研究を可能にする。 SlangTrack Word Sense Disambiguation (ST-WSD)データセットは、スラングと標準使用法を組み合わせた単語のための、きめ細かいインスタンスレベルのセンスアノテーションを提供し、WSDと意味変化検出モデルの体系的なベンチマークをサポートする。これらのベンチマークを用いて、文脈的埋め込みを用いた教師なしクラスタリング、教師付き機械学習、トランスフォーマーベースモデル、最先端の大規模言語モデルなど、さまざまな方法論ファミリのモデルを体系的に評価する。評価システムのうち,数発のGPT-4oモデルは,エクササイズマッチング(Exact Sense Match, ESM)と複数ラベルの精度において, 最強の総合性能を達成したが, マクロF1スコアは全システムで0.5付近であり, 希少なスラング感覚は依然として困難であり, 中心的オープンチャレンジと認識されている。

論文の概要: The BD-LSC Dataset: Facilitating the Benchmarking of Models for Lexical Semantic Change Detection in Slang and Standard Usage

関連論文リスト