Fugu-MT 論文翻訳(概要): STELLA: Self-Reflective Terminology-Aware Framework for Building an Aerospace Information Retrieval Benchmark

論文の概要: STELLA: Self-Reflective Terminology-Aware Framework for Building an Aerospace Information Retrieval Benchmark

arxiv url: http://arxiv.org/abs/2601.03496v1
Date: Wed, 07 Jan 2026 01:23:44 GMT
ステータス: 翻訳完了
システム内更新日: 2026-01-08 18:12:46.103993
Title: STELLA: Self-Reflective Terminology-Aware Framework for Building an Aerospace Information Retrieval Benchmark
Title（参考訳）: STELLA: 航空情報検索ベンチマーク構築のための自己回帰的用語認識フレームワーク
Authors: Bongmin Kim,
Abstract要約: STELLAベンチマークはNASA Technical Reports Server (NTRS) の文書から構築された航空宇宙固有のIR評価セットである。フレームワークは2種類のクエリを生成する: Concordant Query (TCQ)。 STELLAベンチマークにおける7つの埋め込みモデルの評価は、大きなデコーダベースの埋め込みモデルが最も強力なセマンティック理解を示すことを示している。
参考スコア（独自算出の注目度）: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Tasks in the aerospace industry heavily rely on searching and reusing large volumes of technical documents, yet there is no public information retrieval (IR) benchmark that reflects the terminology- and query-intent characteristics of this domain. To address this gap, this paper proposes the STELLA (Self-Reflective TErminoLogy-Aware Framework for BuiLding an Aerospace Information Retrieval Benchmark) framework. Using this framework, we introduce the STELLA benchmark, an aerospace-specific IR evaluation set constructed from NASA Technical Reports Server (NTRS) documents via a systematic pipeline that comprises document layout detection, passage chunking, terminology dictionary construction, synthetic query generation, and cross-lingual extension. The framework generates two types of queries: the Terminology Concordant Query (TCQ), which includes the terminology verbatim to evaluate lexical matching, and the Terminology Agnostic Query (TAQ), which utilizes the terminology's description to assess semantic matching. This enables a disentangled evaluation of the lexical and semantic matching capabilities of embedding models. In addition, we combine Chain-of-Density (CoD) and the Self-Reflection method with query generation to improve quality and implement a hybrid cross-lingual extension that reflects real user querying practices. Evaluation of seven embedding models on the STELLA benchmark shows that large decoder-based embedding models exhibit the strongest semantic understanding, while lexical matching methods such as BM25 remain highly competitive in domains where exact lexical matching technical term is crucial. The STELLA benchmark provides a reproducible foundation for reliable performance evaluation and improvement of embedding models in aerospace-domain IR tasks. The STELLA benchmark can be found in https://huggingface.co/datasets/telepix/STELLA.
Abstract（参考訳）: 航空宇宙業界の課題は、大量の技術文書の検索と再利用に大きく依存しているが、この領域の用語とクエリインテントの特徴を反映した公開情報検索(IR)ベンチマークは存在しない。このギャップに対処するため,本稿ではSTELLA(Self-Reflective TErminoLogy-Aware Framework for BuiLding an Aerospace Information Retrieval Benchmark)フレームワークを提案する。このフレームワークを用いて,NASA Technical Reports Server (NTRS) 文書から構築された航空宇宙固有のIR評価セットであるSTELLAベンチマークを,文書レイアウトの検出,通過チャンク,用語辞書構築,合成クエリ生成,言語間拡張を含む系統的なパイプラインを通じて導入する。このフレームワークは2種類のクエリを生成する。Terminology Concordant Query (TCQ) は語彙マッチングを評価するための用語動詞を含むもので、Terminology Agnostic Query (TAQ) は意味マッチングを評価するための用語の記述を利用する。これにより、埋め込みモデルの語彙的および意味的マッチング能力を非交互に評価することができる。さらに、Chain-of-Density(CoD)とSelf-Reflection(セルフリフレクション)メソッドを組み合わせてクエリ生成を行い、品質を改善し、実際のユーザクエリのプラクティスを反映したハイブリッドな言語間拡張を実装します。 STELLAベンチマークによる7つの埋め込みモデルの評価は、大きなデコーダベースの埋め込みモデルが最も強力な意味理解を示し、一方、BM25のような語彙マッチング手法は、正確な語彙マッチング技術用語が不可欠である領域において高い競争力を維持していることを示している。 STELLAベンチマークは、信頼性の高い性能評価と航空宇宙領域IRタスクへの埋め込みモデルの改善のための再現可能な基盤を提供する。 STELLAベンチマークはhttps://huggingface.co/datasets/telepix/STELLAで見ることができる。

論文の概要: STELLA: Self-Reflective Terminology-Aware Framework for Building an Aerospace Information Retrieval Benchmark

関連論文リスト