Related papers: Explicit Inductive Inference using Large Language Models

Related papers

Refinement Provenance Inference: Detecting LLM-Refined Training Prompts from Model Behavior [58.751981587234916]
This paper formalizes the Refinement Provenance Inference (RPI) audit task as Refinement Provenance Inference (RPI)<n>We propose RePro, a logit-based framework that fuses teacher-forced likelihood features with logit-ranking signals.<n>During training, RePro learns a transferable representation via shadow fine-tuning, and uses a lightweight linear head to infer provenance on unseen victims without training-data access.
arXiv Detail & Related papers (2026-01-05T10:16:41Z)
Abductive Inference in Retrieval-Augmented Language Models: Generating and Validating Missing Premises [0.0]
We propose a framework that integrates abductive inference into retrieval-augmented LLMs.<n> Experimental results on abductive reasoning and multi-hop QA benchmarks show that our approach improves both answer accuracy and reasoning faithfulness.<n>This work highlights abductive inference as a promising direction for enhancing the robustness and explainability of RAG systems.
arXiv Detail & Related papers (2025-11-06T03:37:24Z)
Mitigating Spurious Correlations in LLMs via Causality-Aware Post-Training [57.03005244917803]
Large language models (LLMs) often fail on out-of-distribution (OOD) samples due to spurious correlations acquired during pre-training.<n>Here, we aim to mitigate such spurious correlations through causality-aware post-training (CAPT)<n> Experiments on the formal causal inference benchmark CLadder and the logical reasoning dataset PrOntoQA show that 3B-scale language models fine-tuned with CAPT can outperform both traditional SFT and larger LLMs on in-distribution (ID) and OOD tasks.
arXiv Detail & Related papers (2025-06-11T06:30:28Z)
Supervised Optimism Correction: Be Confident When LLMs Are Sure [91.7459076316849]
We establish a novel theoretical connection between supervised fine-tuning and offline reinforcement learning. We show that the widely used beam search method suffers from unacceptable over-optimism. We propose Supervised Optimism Correction, which introduces a simple yet effective auxiliary loss for token-level $Q$-value estimations.
arXiv Detail & Related papers (2025-04-10T07:50:03Z)
Neutralizing Bias in LLM Reasoning using Entailment Graphs [13.5088417466172]
LLMs are often claimed to be capable of Natural Language Inference (NLI), which is widely regarded as a cornerstone of more complex forms of reasoning. We design an unsupervised framework to construct counterfactual reasoning data and fine-tune LLMs to reduce attestation bias. Our framework consistently improves inferential performance on both original and bias-neutralized NLI datasets.
arXiv Detail & Related papers (2025-03-14T17:33:30Z)
On Reference (In-)Determinacy in Natural Language Inference [62.904689974282334]
We revisit the reference determinacy (RD) assumption in the task of natural language inference (NLI) We observe that current NLI models fail in downstream applications such as fact verification, where the input premise and hypothesis may refer to different contexts. We introduce RefNLI, a diagnostic benchmark for identifying reference ambiguity in NLI examples.
arXiv Detail & Related papers (2025-02-09T06:58:13Z)
Lachesis: Predicting LLM Inference Accuracy using Structural Properties of Reasoning Paths [12.377041655669728]
We introduce Lachesis, a predictive model for self-consistency based LLM inferences. We empirically evaluate it using AutoFL, a recently proposed LLM-based fault localisation technique. Results suggest that Lachesis can predict the correctness of answers with a precision of up to 0.8136.
arXiv Detail & Related papers (2024-12-11T10:56:47Z)
Evaluating the Correctness of Inference Patterns Used by LLMs for Judgment [53.17596274334017]
We evaluate the correctness of the detailed inference patterns of an LLM behind its seemingly correct outputs.<n>Experiments show that even when the language generation results appear correct, a significant portion of the inference patterns used by the LLM for the legal judgment may represent misleading or irrelevant logic.
arXiv Detail & Related papers (2024-10-06T08:33:39Z)
Cycles of Thought: Measuring LLM Confidence through Stable Explanations [53.15438489398938]
Large language models (LLMs) can reach and even surpass human-level accuracy on a variety of benchmarks, but their overconfidence in incorrect responses is still a well-documented failure mode. We propose a framework for measuring an LLM's uncertainty with respect to the distribution of generated explanations for an answer.
arXiv Detail & Related papers (2024-06-05T16:35:30Z)
Quantitative Certification of Bias in Large Language Models [18.91760463088026]
Large Language Models (LLMs) can produce biased responses that can cause representational harms. We propose QuaCer-B that certifies LLMs for bias on distributions of prompts.
arXiv Detail & Related papers (2024-05-29T05:39:37Z)
Take Care of Your Prompt Bias! Investigating and Mitigating Prompt Bias in Factual Knowledge Extraction [56.17020601803071]
Recent research shows that pre-trained language models (PLMs) suffer from "prompt bias" in factual knowledge extraction. This paper aims to improve the reliability of existing benchmarks by thoroughly investigating and mitigating prompt bias.
arXiv Detail & Related papers (2024-03-15T02:04:35Z)
Causal Prompting: Debiasing Large Language Model Prompting based on Front-Door Adjustment [32.12998469814097]
A novel causal prompting method based on front-door adjustment is proposed to effectively mitigate Large Language Models (LLMs) biases. Experimental results show that the proposed causal prompting approach achieves excellent performance across seven natural language processing datasets.
arXiv Detail & Related papers (2024-03-05T07:47:34Z)
HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models [81.56455625624041]
We introduce the first open-source benchmark to utilize external large language models (LLMs) for ASR error correction. The proposed benchmark contains a novel dataset, HyPoradise (HP), encompassing more than 334,000 pairs of N-best hypotheses. LLMs with reasonable prompt and its generative capability can even correct those tokens that are missing in N-best list.
arXiv Detail & Related papers (2023-09-27T14:44:10Z)
Hypothesis Search: Inductive Reasoning with Language Models [39.03846394586811]
Recent work evaluates large language models on inductive reasoning tasks by directly prompting them yielding "in context learning" This works well for straightforward inductive tasks but performs poorly on complex tasks such as the Abstraction and Reasoning Corpus (ARC) In this work, we propose to improve the inductive reasoning ability of LLMs by generating explicit hypotheses at multiple levels of abstraction.
arXiv Detail & Related papers (2023-09-11T17:56:57Z)
Sources of Hallucination by Large Language Models on Inference Tasks [16.644096408742325]
Large Language Models (LLMs) are claimed to be capable of Natural Language Inference (NLI) We present a series of behavioral studies on several LLM families which probe their behavior using controlled experiments.
arXiv Detail & Related papers (2023-05-23T22:24:44Z)
ADEPT: A DEbiasing PrompT Framework [49.582497203415855]
Finetuning is an applicable approach for debiasing contextualized word embeddings. discrete prompts with semantic meanings have shown to be effective in debiasing tasks. We propose ADEPT, a method to debias PLMs using prompt tuning while maintaining the delicate balance between removing biases and ensuring representation ability.
arXiv Detail & Related papers (2022-11-10T08:41:40Z)
Beyond Distributional Hypothesis: Let Language Models Learn Meaning-Text Correspondence [45.9949173746044]
We show that large-size pre-trained language models (PLMs) do not satisfy the logical negation property (LNP) We propose a novel intermediate training task, names meaning-matching, designed to directly learn a meaning-text correspondence. We find that the task enables PLMs to learn lexical semantic information.
arXiv Detail & Related papers (2022-05-08T08:37:36Z)
Probing as Quantifying the Inductive Bias of Pre-trained Representations [99.93552997506438]
We present a novel framework for probing where the goal is to evaluate the inductive bias of representations for a particular task. We apply our framework to a series of token-, arc-, and sentence-level tasks.
arXiv Detail & Related papers (2021-10-15T22:01:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.