Fugu-MT 論文翻訳(概要): SteuerLLM: Local specialized large language model for German tax law analysis

論文の概要: SteuerLLM: Local specialized large language model for German tax law analysis

arxiv url: http://arxiv.org/abs/2602.11081v1
Date: Wed, 11 Feb 2026 17:46:01 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-12 21:44:02.264317
Title: SteuerLLM: Local specialized large language model for German tax law analysis
Title（参考訳）: SteuerLLM:ドイツ税法分析のための地域特化大規模言語モデル
Authors: Sebastian Wind, Jeta Sopa, Laurin Schmid, Quirin Jackl, Sebastian Kiefer, Fei Wu, Martin Mayr, Harald Köstler, Gerhard Wellein, Andreas Maier, Soroosh Tayebi Arasteh,
Abstract要約: 大規模言語モデル(LLM)は、強い一般的な推論と言語理解を示すが、その性能は厳格な形式規則によって支配される領域で低下する。我々は、ドイツの大学税法試験から派生した最初のオープンベンチマークであるSteuerExを作成した。我々は、大規模な合成データセットに基づいて訓練されたドイツ税法のためのドメイン適応LLMであるSteuerLLMを提案する。
参考スコア（独自算出の注目度）: 8.82402339973647
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) demonstrate strong general reasoning and language understanding, yet their performance degrades in domains governed by strict formal rules, precise terminology, and legally binding structure. Tax law exemplifies these challenges, as correct answers require exact statutory citation, structured legal argumentation, and numerical accuracy under rigid grading schemes. We algorithmically generate SteuerEx, the first open benchmark derived from authentic German university tax law examinations. SteuerEx comprises 115 expert-validated examination questions spanning six core tax law domains and multiple academic levels, and employs a statement-level, partial-credit evaluation framework that closely mirrors real examination practice. We further present SteuerLLM, a domain-adapted LLM for German tax law trained on a large-scale synthetic dataset generated from authentic examination material using a controlled retrieval-augmented pipeline. SteuerLLM (28B parameters) consistently outperforms general-purpose instruction-tuned models of comparable size and, in several cases, substantially larger systems, demonstrating that domain-specific data and architectural adaptation are more decisive than parameter scale for performance on realistic legal reasoning tasks. All benchmark data, training datasets, model weights, and evaluation code are released openly to support reproducible research in domain-specific legal artificial intelligence. A web-based demo of SteuerLLM is available at https://steuerllm.i5.ai.fau.de.
Abstract（参考訳）: 大規模言語モデル(LLM)は、強い一般的な推論と言語理解を示すが、その性能は厳密な形式規則、正確な用語、法的拘束構造によって支配される領域で低下する。税法は、正確な法的な引用、構造化された法的議論、厳格な格付けスキームの下での数値的正確性を必要とするため、これらの課題を実証している。我々は、ドイツの大学税法試験から派生した最初のオープンベンチマークであるSteuerExをアルゴリズムで生成する。 SteuerExは6つの中核税法ドメインと複数の学術レベルにまたがる115の専門家公認試験質問で構成されており、実際の試験の実践を忠実に反映した声明レベルの部分クレディット評価フレームワークを使用している。さらに,制御された検索拡張パイプラインを用いて,認証試験材料から生成された大規模合成データセットに基づいて訓練されたドイツ税法用ドメイン適応LLMであるSteuerLLMについて述べる。 SteuerLLM (28Bパラメータ) は、同等の大きさの汎用的な命令調整モデルよりも一貫して優れており、場合によっては、ドメイン固有のデータとアーキテクチャ適応が、現実的な法的推論タスクのパフォーマンスのパラメータスケールよりも決定的であることを証明している。すべてのベンチマークデータ、トレーニングデータセット、モデルウェイト、評価コードはすべて、ドメイン固有の法的人工知能における再現可能な研究をサポートするために、公開リリースされている。 SteuerLLMのWebベースのデモはhttps://steuerllm.i5.ai.fau.deで公開されている。

論文の概要: SteuerLLM: Local specialized large language model for German tax law analysis

関連論文リスト