Fugu-MT 論文翻訳(概要): If Probable, Then Acceptable? Understanding Conditional Acceptability Judgments in Large Language Models

論文の概要: If Probable, Then Acceptable? Understanding Conditional Acceptability Judgments in Large Language Models

arxiv url: http://arxiv.org/abs/2510.08388v1
Date: Thu, 09 Oct 2025 16:12:10 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-10 17:54:15.185671
Title: If Probable, Then Acceptable? Understanding Conditional Acceptability Judgments in Large Language Models
Title（参考訳）: 予測可能ならば、受理可能か? 大規模言語モデルにおける条件付き受容可能性判断の理解
Authors: Jasmin Orth, Philipp Mondorf, Barbara Plank,
Abstract要約: 条件受理性とは、条件文がどのように認識されるかを示す。これは、個人が含意を解釈し、議論を評価し、仮説的なシナリオに基づいて決定する方法に影響を与える。大規模な言語モデルが、そのようなステートメントの$textitacceptability$をどの程度判断するかは、まだ不明である。
参考スコア（独自算出の注目度）: 37.930280449304696
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Conditional acceptability refers to how plausible a conditional statement is perceived to be. It plays an important role in communication and reasoning, as it influences how individuals interpret implications, assess arguments, and make decisions based on hypothetical scenarios. When humans evaluate how acceptable a conditional "If A, then B" is, their judgments are influenced by two main factors: the $\textit{conditional probability}$ of $B$ given $A$, and the $\textit{semantic relevance}$ of the antecedent $A$ given the consequent $B$ (i.e., whether $A$ meaningfully supports $B$). While prior work has examined how large language models (LLMs) draw inferences about conditional statements, it remains unclear how these models judge the $\textit{acceptability}$ of such statements. To address this gap, we present a comprehensive study of LLMs' conditional acceptability judgments across different model families, sizes, and prompting strategies. Using linear mixed-effects models and ANOVA tests, we find that models are sensitive to both conditional probability and semantic relevance-though to varying degrees depending on architecture and prompting style. A comparison with human data reveals that while LLMs incorporate probabilistic and semantic cues, they do so less consistently than humans. Notably, larger models do not necessarily align more closely with human judgments.
Abstract（参考訳）: 条件受理性とは、条件文がどのように認識されるかを示す。個人が含意を解釈し、議論を評価し、仮説的なシナリオに基づいて決定を行う方法に影響を与えるため、コミュニケーションと推論において重要な役割を果たす。人間が条件 "If A, then B" をどの程度許容するかを評価するとき、その判断は2つの主要な要因に影響される:$\textit{conditional probability}$ of $B$ given $A$、$\textit{semantic relevance}$ of the antecedent $A$ given the consequent $B$(つまり、$A$が有意に$B$をサポートするかどうか)。以前の研究では、大言語モデル(LLM)が条件文に関する推論をどのように引き出すか調べてきたが、これらのモデルがどのように$\textit{acceptability}$を判断しているのかは定かではない。このギャップに対処するため, モデルファミリー, サイズ, プロンプト戦略にまたがるLCMの条件受容性判定について, 総合的研究を行った。線形混合影響モデルとANOVAテストを用いて、モデルが条件付き確率と意味的関連性の両方に敏感であることを発見した。人間のデータと比較すると、LLMには確率的・意味的な手がかりが組み込まれているが、人間ほど一貫していないことが分かる。特に、より大きなモデルは必ずしも人間の判断とより密に一致しない。

関連論文リスト

Let's Think Var-by-Var: Large Language Models Enable Ad Hoc Probabilistic Reasoning [15.568698101627088]
大規模言語モデル(LLM)から共通感覚を抽出することを提案する。私たちは、$textitguesstimation$の質問に焦点を合わせています。我々のフレームワークは、そのような疑問に$textitad hoc$probabilistic modelで答える。
論文参考訳（メタデータ） (2024-12-03T01:53:06Z)
Partial Identifiability and Misspecification in Inverse Reinforcement Learning [64.13583792391783]
Inverse Reinforcement Learning の目的は、報酬関数 $R$ をポリシー $pi$ から推論することである。本稿では,IRLにおける部分的識別性と不特定性について包括的に分析する。
論文参考訳（メタデータ） (2024-11-24T18:35:46Z)
QUITE: Quantifying Uncertainty in Natural Language Text in Bayesian Reasoning Scenarios [15.193544498311603]
本稿では,カテゴリー的確率変数と複雑な関係を持つ実世界のベイズ推論シナリオのデータセットであるQUITEを提案する。我々は幅広い実験を行い、論理ベースのモデルが全ての推論型において、アウト・オブ・ボックスの大規模言語モデルより優れていることを発見した。以上の結果から,ニューロシンボリックモデルが複雑な推論を改善する上で有望な方向であることを示す。
論文参考訳（メタデータ） (2024-10-14T12:44:59Z)
Mind the Gap: A Causal Perspective on Bias Amplification in Prediction & Decision-Making [58.06306331390586]
本稿では,閾値演算による予測値がS$変化の程度を測るマージン補数の概念を導入する。適切な因果仮定の下では、予測スコア$S$に対する$X$の影響は、真の結果$Y$に対する$X$の影響に等しいことを示す。
論文参考訳（メタデータ） (2024-05-24T11:22:19Z)
Log Probabilities Are a Reliable Estimate of Semantic Plausibility in Base and Instruction-Tuned Language Models [50.15455336684986]
意味的妥当性を評価するため,LogProbsの有効性と基本的なプロンプトを評価した。 LogProbsは、直接ゼロショットプロンプトよりも、より信頼性の高いセマンティックな妥当性を提供する。我々は,プロンプトベースの評価の時代においても,LogProbsは意味的妥当性の有用な指標である,と結論付けた。
論文参考訳（メタデータ） (2024-03-21T22:08:44Z)
Misspecification in Inverse Reinforcement Learning [80.91536434292328]
逆強化学習(IRL)の目的は、ポリシー$pi$から報酬関数$R$を推論することである。 IRLの背後にある主要な動機の1つは、人間の行動から人間の嗜好を推測することである。これは、それらが誤って特定され、現実世界のデータに適用された場合、不適切な推測につながる恐れが生じることを意味する。
論文参考訳（メタデータ） (2022-12-06T18:21:47Z)
The Projected Covariance Measure for assumption-lean variable significance testing [3.8936058127056357]
単純だが一般的なアプローチは、線形モデルを指定し、次に$X$の回帰係数が 0 でないかどうかをテストすることである。条件付き平均独立性のモデルフリーなnullをテストする問題、すなわち条件付き平均の$Y$$$X$と$Z$は$X$に依存しない。本稿では,加法モデルやランダムフォレストなど,柔軟な非パラメトリックあるいは機械学習手法を活用可能な,シンプルで汎用的なフレームワークを提案する。
論文参考訳（メタデータ） (2022-11-03T17:55:50Z)
Probabilistic Variational Causal Approach in Observational Studies [0.0]
本報告では,観測研究における事象の出現頻度と頻度を,根本的問題との関連性に基づいて考慮する新たな因果的手法を提案する。本稿では,確率的偏差因果効果(probabilistic vAriational Causal Effect, PACE)と呼ばれる直接因果効果測定法と,非二項および二項処理に適用可能な特定の仮定に固執する変動について述べる。
論文参考訳（メタデータ） (2022-08-12T13:34:17Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。