Fugu-MT 論文翻訳(概要): Exploring the Vertical-Domain Reasoning Capabilities of Large Language Models

論文の概要: Exploring the Vertical-Domain Reasoning Capabilities of Large Language Models

arxiv url: http://arxiv.org/abs/2512.22443v1
Date: Sat, 27 Dec 2025 02:39:34 GMT
ステータス: 翻訳完了
システム内更新日: 2025-12-30 22:37:30.057206
Title: Exploring the Vertical-Domain Reasoning Capabilities of Large Language Models
Title（参考訳）: 大規模言語モデルの垂直領域推論能力の探索
Authors: Jie Zhou, Xin Chen, Jie Zhang, Zhe Li,
Abstract要約: 本研究では,垂直領域会計推論の概念を導入し,評価基準を確立する。本稿では, GLM-6B, GLM-130B, GLM-4, OpenAI GPT-4 などの代表モデルについて, 会計推論タスクのセットを用いて評価する。
参考スコア（独自算出の注目度）: 19.821219678322517
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) are reshaping learning paradigms, cognitive processes, and research methodologies across a wide range of domains. Integrating LLMs with professional fields and redefining the relationship between LLMs and domain-specific applications has become a critical challenge for promoting enterprise digital transformation and broader social development. To effectively integrate LLMs into the accounting domain, it is essential to understand their domain-specific reasoning capabilities. This study introduces the concept of vertical-domain accounting reasoning and establishes evaluation criteria by analyzing the training data characteristics of representative GLM-series models. These criteria provide a foundation for subsequent research on reasoning paradigms and offer benchmarks for improving accounting reasoning performance. Based on this framework, we evaluate several representative models, including GLM-6B, GLM-130B, GLM-4, and OpenAI GPT-4, on a set of accounting reasoning tasks. Experimental results show that different prompt engineering strategies lead to varying degrees of performance improvement across models, with GPT-4 achieving the strongest accounting reasoning capability. However, current LLMs still fall short of real-world application requirements. In particular, further optimization is needed for deployment in enterprise-level accounting scenarios to fully realize the potential value of LLMs in this domain.
Abstract（参考訳）: 大規模言語モデル(LLM)は、幅広い領域にわたる学習パラダイム、認知プロセス、研究方法論を再構築している。 LLMを専門分野に統合し、LLMとドメイン固有のアプリケーションとの関係を再定義することは、エンタープライズデジタルトランスフォーメーションとより広範な社会開発を促進する上で重要な課題となっている。 LLMを会計領域に効果的に統合するには、それらのドメイン固有の推論能力を理解することが不可欠である。本研究では,垂直領域会計推論の概念を導入し,代表的なGLM系列モデルのトレーニングデータ特性を分析して評価基準を確立する。これらの基準は、その後の推論パラダイムの研究の基礎を提供し、会計推論のパフォーマンスを改善するためのベンチマークを提供する。本稿では,GLM-6B,GLM-130B,GLM-4,OpenAI GPT-4などの代表的なモデルについて,一連の会計推論タスクを用いて評価する。実験結果から, GPT-4が最強のアカウンティング推論能力を達成することにより, 異なるプロンプトエンジニアリング戦略により, モデル間の性能改善の度合いが変化することが示された。しかし、現在のLLMは依然として現実世界のアプリケーション要件に欠けています。特に、このドメインにおけるLLMの潜在的な価値を完全に実現するために、エンタープライズレベルの会計シナリオへの展開には、さらなる最適化が必要である。

論文の概要: Exploring the Vertical-Domain Reasoning Capabilities of Large Language Models

関連論文リスト