Fugu-MT 論文翻訳(概要): Knowledge Crosswords: Geometric Reasoning over Structured Knowledge with Large Language Models

論文の概要: Knowledge Crosswords: Geometric Reasoning over Structured Knowledge with Large Language Models

arxiv url: http://arxiv.org/abs/2310.01290v1
Date: Mon, 2 Oct 2023 15:43:53 GMT
ステータス: 翻訳完了
システム内更新日: 2023-10-04 21:14:01.873817
Title: Knowledge Crosswords: Geometric Reasoning over Structured Knowledge with Large Language Models
Title（参考訳）: 知識のクロスワード:大規模言語モデルによる構造化知識の幾何学的推論
Authors: Wenxuan Ding, Shangbin Feng, Yuhan Liu, Zhaoxuan Tan, Vidhisha Balachandran, Tianxing He, Yulia Tsvetkov
Abstract要約: 構造的知識に対する幾何学的推論を提案し、そこでは知識の一部がグラフ構造に連結され、モデルは不足した情報を埋める必要がある。このような幾何学的知識推論は、構造化された知識、不確実性のある推論、事実の検証、エラーが発生した時のバックトラックを扱う能力を必要とする。本稿では,不完全なエンティティネットワークの幾何学的制約を表す自然言語質問からなるマルチブランクQAデータセットであるKnowledge Crosswordsを提案する。
参考スコア（独自算出の注目度）: 51.35398315130094
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) are widely adopted in knowledge-intensive tasks and have achieved impressive performance thanks to their knowledge abilities. While LLMs have demonstrated outstanding performance on atomic or linear (multi-hop) QA tasks, whether they can reason in knowledge-rich scenarios with interweaving constraints remains an underexplored problem. In this work, we propose geometric reasoning over structured knowledge, where pieces of knowledge are connected in a graph structure and models need to fill in the missing information. Such geometric knowledge reasoning would require the ability to handle structured knowledge, reason with uncertainty, verify facts, and backtrack when an error occurs. We propose Knowledge Crosswords, a multi-blank QA dataset where each problem consists of a natural language question representing the geometric constraints of an incomplete entity network, where LLMs are tasked with working out the missing entities while meeting all factual constraints. Knowledge Crosswords contains 2,101 individual problems, covering various knowledge domains and further divided into three difficulty levels. We conduct extensive experiments to evaluate existing LLM prompting approaches on the Knowledge Crosswords benchmark. We additionally propose two new approaches, Staged Prompting and Verify-All, to augment LLMs' ability to backtrack and verify structured constraints. Our results demonstrate that while baseline approaches perform well on easier problems but struggle with hard ones, our proposed Verify-All outperforms other methods by a large margin and is more robust with hard problems. Further analysis reveals that LLMs' ability of geometric reasoning over structured knowledge is still far from robust or perfect, susceptible to confounders such as the order of options, certain structural patterns, assumption of existence of correct answer, and more.
Abstract（参考訳）: 大規模言語モデル(LLM)は知識集約的なタスクで広く採用されており、その知識能力によって優れたパフォーマンスを実現している。 LLMは、原子的または線形(マルチホップ)なQAタスクにおいて顕著な性能を示してきたが、それらが、インターウィービング制約を伴う知識豊富なシナリオで推論できるかどうかはまだ未解決の問題である。そこで本研究では,知識の一部がグラフ構造に連結され,モデルが不足する情報を埋める必要がある,構造化知識に対する幾何学的推論を提案する。このような幾何学的知識推論は、構造化された知識、不確実性のある推論、事実の検証、エラーが発生した時のバックトラックを扱う能力を必要とする。そこで我々は,不完全なエンティティネットワークの幾何学的制約を表現する自然言語質問と,すべての制約を満たしながら行方不明のエンティティの処理をllmが行う自然言語質問からなる,マルチブランクqaデータセットであるナレッジクロスワードを提案する。知識クロスワードには2,101の個別の問題が含まれ、様々な知識領域をカバーし、さらに3つの困難レベルに分けられる。我々はknowledge crosswordsベンチマークで既存のllmプロンプトアプローチを評価するための広範囲な実験を行う。さらに,LLMのバックトラックと構造化制約の検証能力を高めるために,Staged PromptingとVerify-Allという2つの新しいアプローチを提案する。提案するVerify-Allは,より簡単な問題に対してベースラインアプローチが良好に機能する一方で,他の手法よりも大きなマージンで性能が向上し,難しい問題に対して堅牢であることを示す。さらなる分析により、llmsの構造化知識よりも幾何学的推論の能力は、選択肢の順序、特定の構造的パターン、正しい答えの存在の仮定など、共同創設者に影響を受けやすい、強固あるいは完全とは程遠いことが判明した。

論文の概要: Knowledge Crosswords: Geometric Reasoning over Structured Knowledge with Large Language Models

関連論文リスト