Fugu-MT 論文翻訳(概要): Ai-Facilitated Analysis of Abstracts and Conclusions: Flagging Unsubstantiated Claims and Ambiguous Pronouns

論文の概要: Ai-Facilitated Analysis of Abstracts and Conclusions: Flagging Unsubstantiated Claims and Ambiguous Pronouns

arxiv url: http://arxiv.org/abs/2506.13172v1
Date: Mon, 16 Jun 2025 07:34:31 GMT
ステータス: 翻訳完了
システム内更新日: 2025-06-17 17:28:47.688941
Title: Ai-Facilitated Analysis of Abstracts and Conclusions: Flagging Unsubstantiated Claims and Ambiguous Pronouns
Title（参考訳）: Ai-Facilitated Analysis of Abstracts and Conclusions: Flagging unsubstantiated Claims and Ambiguous Pronouns
Authors: Evgeny Markhasin,
Abstract要約: 我々は,人間のような階層的推論を引き出すために設計された概念証明プロンプトを提示し,評価する。このプロンプトは、2つの非自明な解析的タスクをターゲットにしている。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: We present and evaluate a suite of proof-of-concept (PoC), structured workflow prompts designed to elicit human-like hierarchical reasoning while guiding Large Language Models (LLMs) in high-level semantic and linguistic analysis of scholarly manuscripts. The prompts target two non-trivial analytical tasks: identifying unsubstantiated claims in summaries (informational integrity) and flagging ambiguous pronoun references (linguistic clarity). We conducted a systematic, multi-run evaluation on two frontier models (Gemini Pro 2.5 Pro and ChatGPT Plus o3) under varied context conditions. Our results for the informational integrity task reveal a significant divergence in model performance: while both models successfully identified an unsubstantiated head of a noun phrase (95% success), ChatGPT consistently failed (0% success) to identify an unsubstantiated adjectival modifier that Gemini correctly flagged (95% success), raising a question regarding potential influence of the target's syntactic role. For the linguistic analysis task, both models performed well (80-90% success) with full manuscript context. In a summary-only setting, however, ChatGPT achieved a perfect (100%) success rate, while Gemini's performance was substantially degraded. Our findings suggest that structured prompting is a viable methodology for complex textual analysis but show that prompt performance may be highly dependent on the interplay between the model, task type, and context, highlighting the need for rigorous, model-specific testing.
Abstract（参考訳）: 学術写本の高レベルな意味的・言語的分析において,大規模言語モデル(LLM)を指導しながら,人間のような階層的推論を導出するための構造化されたワークフロープロンプトである概念実証(PoC)スイートを提示し,評価する。このプロンプトは、要約(情報整合性)における不確定なクレームを識別し、曖昧な代名詞参照(言語的明確性)をフラグ付けする2つの非自明な解析的タスクをターゲットにしている。異なる状況下で2つのフロンティアモデル(Gemini Pro 2.5 ProとChatGPT Plus o3)の系統的マルチラン評価を行った。両モデルとも名詞句の未確定な頭部(95%成功)を同定できたが、ChatGPTは一貫して失敗し(0%成功)、ジェミニが正しくフラグを立てた形容詞修飾子(95%成功)を同定し(95%成功)、標的の構文的役割の潜在的影響について疑問を投げかけた。言語分析のタスクでは、両方のモデルが十分に(80-90%の成功)、完全な原稿コンテキストで実行された。しかし、要約のみの環境では、ChatGPTは完全な(100%)成功率を達成し、ジェミニのパフォーマンスは大幅に低下した。本研究の結果から,構造化プロンプトは複雑なテキスト解析の方法論として有用であることが示唆されるが,迅速な性能はモデル,タスクタイプ,コンテキスト間の相互作用に大きく依存し,厳密でモデル固有のテストの必要性を浮き彫りにしている。

関連論文リスト

A linguistically-motivated evaluation methodology for unraveling model's abilities in reading comprehension tasks [10.181408678232055]
モデルのサイズやアーキテクチャに関わらず,特定の例が常に低いスコアを得られるという直感に基づいて,理解タスクを読むための評価手法を提案する。この複雑さを特徴付けるためのセマンティックフレームアノテーションを活用し、モデルの難易度を考慮に入れうる7つの複雑さ要因について検討する。
論文参考訳（メタデータ） (2025-01-29T11:05:20Z)
Localizing Factual Inconsistencies in Attributable Text Generation [91.981439746404]
本稿では,帰属可能なテキスト生成における事実の不整合をローカライズするための新しい形式であるQASemConsistencyを紹介する。まず,人間のアノテーションに対するQASemConsistency法の有効性を示す。そこで我々は,局所的な事実の不整合を自動的に検出するいくつかの手法を実装した。
論文参考訳（メタデータ） (2024-10-09T22:53:48Z)
Análise de ambiguidade linguística em modelos de linguagem de grande escala (LLMs) [0.35069196259739965]
言語的曖昧さは、自然言語処理(NLP)システムにとって重要な課題である。近年のChatGPTやGeminiのような教育モデルの成功に触発されて,これらのモデルにおける言語的あいまいさを分析し,議論することを目的とした。
論文参考訳（メタデータ） (2024-04-25T14:45:07Z)
How Well Do Text Embedding Models Understand Syntax? [50.440590035493074]
テキスト埋め込みモデルが幅広い構文的文脈にまたがって一般化する能力は、まだ解明されていない。その結果,既存のテキスト埋め込みモデルは,これらの構文的理解課題に十分対応していないことが明らかとなった。多様な構文シナリオにおけるテキスト埋め込みモデルの一般化能力を高めるための戦略を提案する。
論文参考訳（メタデータ） (2023-11-14T08:51:00Z)
SOUL: Towards Sentiment and Opinion Understanding of Language [96.74878032417054]
我々は、言語感覚とオピニオン理解(SOUL)と呼ばれる新しいタスクを提案する。 SOULは2つのサブタスクを通して感情理解を評価することを目的としている:レビュー(RC)と正当化生成(JG)。
論文参考訳（メタデータ） (2023-10-27T06:48:48Z)
"You Are An Expert Linguistic Annotator": Limits of LLMs as Analyzers of Abstract Meaning Representation [60.863629647985526]
文意味構造の解析において, GPT-3, ChatGPT, および GPT-4 モデルの成功と限界について検討した。モデルはAMRの基本形式を確実に再現でき、しばしばコアイベント、引数、修飾子構造をキャプチャできる。全体としては,これらのモデルではセマンティック構造の側面を捉えることができるが,完全に正確なセマンティック解析や解析をサポートする能力には重要な制限が残されている。
論文参考訳（メタデータ） (2023-10-26T21:47:59Z)
Towards Improving Faithfulness in Abstractive Summarization [37.19777407790153]
本稿では,抽象的な要約における忠実度を改善するために,FES(Fithfulness Enhanced Summarization Model)を提案する。我々のモデルはCNN/DMとXSumの実験において強いベースラインを上回ります。
論文参考訳（メタデータ） (2022-10-04T19:52:09Z)
Exploring Multi-Modal Representations for Ambiguity Detection & Coreference Resolution in the SIMMC 2.0 Challenge [60.616313552585645]
会話型AIにおける効果的なあいまいさ検出と参照解決のためのモデルを提案する。具体的には,TOD-BERTとLXMERTをベースとしたモデルを用いて,多数のベースラインと比較し,アブレーション実験を行う。以上の結果から,(1)言語モデルでは曖昧さを検出するためにデータの相関を活用でき,(2)言語モデルではビジョンコンポーネントの必要性を回避できることがわかった。
論文参考訳（メタデータ） (2022-02-25T12:10:02Z)
Did the Cat Drink the Coffee? Challenging Transformers with Generalized Event Knowledge [59.22170796793179]
Transformers Language Models (TLMs) を数学的適合のテクトダイナミックな評価のためのベンチマークで検証した。以上の結果から, TLM は SDM に匹敵する性能が得られることが示された。しかし、さらなる分析は、TLMがイベント知識の重要な側面を捉えていないことを一貫して示唆している。
論文参考訳（メタデータ） (2021-07-22T20:52:26Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。