Fugu-MT 論文翻訳(概要): PricingLogic: Evaluating LLMs Reasoning on Complex Tourism Pricing Tasks

論文の概要: PricingLogic: Evaluating LLMs Reasoning on Complex Tourism Pricing Tasks

arxiv url: http://arxiv.org/abs/2510.12409v1
Date: Tue, 14 Oct 2025 11:42:15 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-15 19:02:32.298859
Title: PricingLogic: Evaluating LLMs Reasoning on Complex Tourism Pricing Tasks
Title（参考訳）: 価格論理:複合観光価格課題に基づくLCMの評価
Authors: Yunuo Liu, Dawei Zhu, Zena Al-Khalili, Dai Cheng, Yanjun Chen, Dietrich Klakow, Wei Zhang, Xiaoyu Shen,
Abstract要約: PricingLogicは、42の現実世界の価格ポリシーから派生した予約要求に基づく300の自然言語質問で構成されている。 LLMの行の評価は、ルール解釈と算術的推論の体系的な失敗を実証し、難易度層における急激な性能低下を示す。
参考スコア（独自算出の注目度）: 28.577623054100616
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present PricingLogic, the first benchmark that probes whether Large Language Models(LLMs) can reliably automate tourism-related prices when multiple, overlapping fare rules apply. Travel agencies are eager to offload this error-prone task onto AI systems; however, deploying LLMs without verified reliability could result in significant financial losses and erode customer trust. PricingLogic comprises 300 natural-language questions based on booking requests derived from 42 real-world pricing policies, spanning two levels of difficulty: (i) basic customer-type pricing and (ii)bundled-tour calculations involving interacting discounts. Evaluations of a line of LLMs reveal a steep performance drop on the harder tier,exposing systematic failures in rule interpretation and arithmetic reasoning.These results highlight that, despite their general capabilities, today's LLMs remain unreliable in revenue-critical applications without further safeguards or domain adaptation. Our code and dataset are available at https://github.com/EIT-NLP/PricingLogic.
Abstract（参考訳）: 複数重複の運賃ルールを適用した場合、LLM(Large Language Models)が観光関連価格を確実に自動化できるかどうかを調査する最初のベンチマークであるPricingLogicを提示する。旅行代理店は、このエラーが発生しやすいタスクをAIシステムにオフロードしたいと熱心に考えている。 PricingLogicは、42の現実世界の価格ポリシーから得られた予約要求に基づいて、300の自然言語質問で構成されている。一基本顧客型価格及び価格 (二)相互作用割引を含む有価証券計算 LLMの評価は、ルール解釈や算術的推論の体系的な失敗を実証し、難易度の高い性能低下を示すが、これらの結果は、その一般的な能力にもかかわらず、今日のLLMは、さらなる安全確保やドメイン適応なしに、収益クリティカルなアプリケーションでは信頼できないことを浮き彫りにしている。私たちのコードとデータセットはhttps://github.com/EIT-NLP/PricingLogic.orgで公開されています。

論文の概要: PricingLogic: Evaluating LLMs Reasoning on Complex Tourism Pricing Tasks

関連論文リスト