Fugu-MT 論文翻訳(概要): How Reliable are LLMs for Reasoning on the Re-ranking task?

論文の概要: How Reliable are LLMs for Reasoning on the Re-ranking task?

arxiv url: http://arxiv.org/abs/2508.18444v1
Date: Mon, 25 Aug 2025 19:48:39 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-27 17:42:38.573161
Title: How Reliable are LLMs for Reasoning on the Re-ranking task?
Title（参考訳）: リグレードタスクにおけるLLMの信頼性
Authors: Nafis Tanveer Islam, Zhiming Zhao,
Abstract要約: 大規模言語モデル(LLM)における学習方法の違いがタスクのセマンティック理解に与える影響を解析する。ユーザエンゲージメントが制限され、ランキングデータが不十分な新規開発システムでは、コンテンツを正確に再ランク付けすることが大きな課題である。
参考スコア（独自算出の注目度）: 3.282961543904818
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: With the improving semantic understanding capability of Large Language Models (LLMs), they exhibit a greater awareness and alignment with human values, but this comes at the cost of transparency. Although promising results are achieved via experimental analysis, an in-depth understanding of the LLM's internal workings is unavoidable to comprehend the reasoning behind the re-ranking, which provides end users with an explanation that enables them to make an informed decision. Moreover, in newly developed systems with limited user engagement and insufficient ranking data, accurately re-ranking content remains a significant challenge. While various training methods affect the training of LLMs and generate inference, our analysis has found that some training methods exhibit better explainability than others, implying that an accurate semantic understanding has not been learned through all training methods; instead, abstract knowledge has been gained to optimize evaluation, which raises questions about the true reliability of LLMs. Therefore, in this work, we analyze how different training methods affect the semantic understanding of the re-ranking task in LLMs and investigate whether these models can generate more informed textual reasoning to overcome the challenges of transparency or LLMs and limited training data. To analyze the LLMs for re-ranking tasks, we utilize a relatively small ranking dataset from the environment and the Earth science domain to re-rank retrieved content. Furthermore, we also analyze the explainable information to see if the re-ranking can be reasoned using explainability.
Abstract（参考訳）: LLM(Large Language Models)のセマンティック理解能力の改善により、人間の価値に対する認識とアライメントが向上するが、これは透明性の犠牲になる。実験分析によって有望な結果が得られるが、LLMの内部動作の深い理解は、再ランクの背景にある理由を理解することは避けられない。また,ユーザエンゲージメントが制限され,ランキングデータが不十分な新規開発システムでは,コンテンツを正確に再ランク付けすることが大きな課題である。様々なトレーニング手法がLLMのトレーニングに影響を与え,推論を生成する一方で,いくつかのトレーニング手法が他の方法よりも優れた説明可能性を示し,すべてのトレーニング手法を通じて正確な意味理解が学習されていないことを示唆する分析結果が得られた。そこで本研究では,異なる学習手法がLLMにおける再分類タスクのセマンティック理解にどのように影響するかを分析し,これらのモデルが透明性やLLMの課題を克服し,限られた学習データに対処できるかどうかを考察する。タスクの再ランク付けには,環境と地球科学領域からの比較的小さなランキングデータセットを用いて,検索したコンテンツを再ランク付けする。さらに、説明可能な情報を分析し、説明可能性を用いて再ランク付けが可能かどうかを確かめる。

論文の概要: How Reliable are LLMs for Reasoning on the Re-ranking task?

関連論文リスト