Related papers: What Language Models Know But Don't Say: Non-Generative Prior Extraction for Generalization

What Language Models Know But Don't Say: Non-Generative Prior Extraction for Generalization

URL: http://arxiv.org/abs/2601.17609v1
Date: Sat, 24 Jan 2026 22:05:01 GMT
Title: What Language Models Know But Don't Say: Non-Generative Prior Extraction for Generalization
Authors: Sara Rezaeimanesh, Mohammad M. Ghassemi,
Abstract summary: We propose LoID, a deterministic method for extracting informative prior distributions for Bayesian logistic regression.<n>Rather than relying on generated text, we probe the model's confidence in opposing semantic directions through carefully constructed sentences.<n>We evaluate LoID on ten real-world datasets under synthetic out-of-distribution (OOD) settings.
Score: 5.663538370244175
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In domains like medicine and finance, large-scale labeled data is costly and often unavailable, leading to models trained on small datasets that struggle to generalize to real-world populations. Large language models contain extensive knowledge from years of research across these domains. We propose LoID (Logit-Informed Distributions), a deterministic method for extracting informative prior distributions for Bayesian logistic regression by directly accessing their token-level predictions. Rather than relying on generated text, we probe the model's confidence in opposing semantic directions (positive vs. negative impact) through carefully constructed sentences. By measuring how consistently the LLM favors one direction across diverse phrasings, we extract the strength and reliability of the model's belief about each feature's influence. We evaluate LoID on ten real-world tabular datasets under synthetic out-of-distribution (OOD) settings characterized by covariate shift, where the training data represents only a subset of the population. We compare our approach against (1) standard uninformative priors, (2) AutoElicit, a recent method that prompts LLMs to generate priors via text completions, (3) LLMProcesses, a method that uses LLMs to generate numerical predictions through in-context learning and (4) an oracle-style upper bound derived from fitting logistic regression on the full dataset. We assess performance using Area Under the Curve (AUC). Across datasets, LoID significantly improves performance over logistic regression trained on OOD data, recovering up to \textbf{59\%} of the performance gap relative to the oracle model. LoID outperforms AutoElicit and LLMProcessesc on 8 out of 10 datasets, while providing a reproducible and computationally efficient mechanism for integrating LLM knowledge into Bayesian inference.

Related papers

Towards Universal Debiasing for Language Models-based Tabular Data Generation [16.31419748401203]
We introduce a universal debiasing framework that minimizes group-level dependencies by simultaneously reducing the mutual information between advantaged and protected attributes.<n>Our framework effectively balances fairness and utility, offering a scalable and practical solution for debiasing in high-stakes applications.
arXiv Detail & Related papers (2025-09-20T00:06:53Z)
Large Language Models as Universal Predictors? An Empirical Study on Small Tabular Datasets [0.0]
Large Language Models (LLMs) can perform predictive tasks over structured inputs without explicit fine-tuning on downstream tasks.<n>We investigate the empirical function approximation capability of LLMs on small-scale structured datasets for classification, regression and clustering tasks.<n>Our findings suggest that LLMs can serve as general-purpose predictive engines for structured data, with clear strengths in classification and significant limitations in regression and clustering.
arXiv Detail & Related papers (2025-08-24T15:00:51Z)
Beyond Naïve Prompting: Strategies for Improved Zero-shot Context-aided Forecasting with LLMs [57.82819770709032]
Large language models (LLMs) can be effective context-aided forecasters via na"ive direct prompting.<n>ReDP improves interpretability by eliciting explicit reasoning traces, allowing us to assess the model's reasoning over the context.<n>CorDP leverages LLMs solely to refine existing forecasts with context, enhancing their applicability in real-world forecasting pipelines.<n> IC-DP proposes embedding historical examples of context-aided forecasting tasks in the prompt, substantially improving accuracy even for the largest models.
arXiv Detail & Related papers (2025-08-13T16:02:55Z)
FisherSFT: Data-Efficient Supervised Fine-Tuning of Language Models Using Information Gain [14.109309236798518]
Supervised fine-tuning (SFT) is a standard approach to adapting large language models (LLMs) to new domains.<n>In this work, we improve the statistical efficiency of SFT by selecting an informative subset of training examples.
arXiv Detail & Related papers (2025-05-20T18:41:34Z)
Learning to Verify Summary Facts with Fine-Grained LLM Feedback [15.007479147796403]
Training automatic summary fact verifiers often faces the challenge of a lack of human-labeled data.<n>We introduce FineSumFact, a large-scale dataset containing fine-grained factual feedback on summaries.
arXiv Detail & Related papers (2024-12-14T05:28:44Z)
Efficient Alignment of Large Language Models via Data Sampling [0.4915744683251149]
We propose an information theory-based methodology for efficient alignment by identifying a small high quality subset.<n>We find that the model aligned using our proposed methodology outperforms other sampling methods and performs comparable to the model aligned with the full dataset.
arXiv Detail & Related papers (2024-11-15T19:36:15Z)
LLM-Forest: Ensemble Learning of LLMs with Graph-Augmented Prompts for Data Imputation [50.375567142250446]
Large language models (LLMs), trained on vast corpora, have shown strong potential in data generation.<n>We propose a novel framework, LLM-Forest, which introduces a "forest" of few-shot prompt learning LLM "trees" with their outputs aggregated via confidence-based weighted voting.<n>This framework is established on a new concept of bipartite information graphs to identify high-quality relevant neighboring entries with both feature and value granularity.
arXiv Detail & Related papers (2024-10-28T20:42:46Z)
Context is Key: A Benchmark for Forecasting with Essential Textual Information [87.3175915185287]
"Context is Key" (CiK) is a forecasting benchmark that pairs numerical data with diverse types of carefully crafted textual context.<n>We evaluate a range of approaches, including statistical models, time series foundation models, and LLM-based forecasters.<n>We propose a simple yet effective LLM prompting method that outperforms all other tested methods on our benchmark.
arXiv Detail & Related papers (2024-10-24T17:56:08Z)
Uncertainty Aware Learning for Language Model Alignment [97.36361196793929]
We propose uncertainty-aware learning (UAL) to improve the model alignment of different task scenarios. We implement UAL in a simple fashion -- adaptively setting the label smoothing value of training according to the uncertainty of individual samples. Experiments on widely used benchmarks demonstrate that our UAL significantly and consistently outperforms standard supervised fine-tuning.
arXiv Detail & Related papers (2024-06-07T11:37:45Z)
A Context-Aware Approach for Enhancing Data Imputation with Pre-trained Language Models [0.18416014644193068]
CRILM uses pre-trained language models to create contextually relevant descriptors for missing values.<n>Our evaluations demonstrate CRILM's superior performance and robustness across MCAR, MAR, and challenging MNAR scenarios.
arXiv Detail & Related papers (2024-05-28T00:08:29Z)
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models [52.98743860365194]
We propose a new fine-tuning method called Self-Play fIne-tuNing (SPIN) At the heart of SPIN lies a self-play mechanism, where the LLM refines its capability by playing against instances of itself. This sheds light on the promise of self-play, enabling the achievement of human-level performance in LLMs without the need for expert opponents.
arXiv Detail & Related papers (2024-01-02T18:53:13Z)
MLLM-DataEngine: An Iterative Refinement Approach for MLLM [62.30753425449056]
We propose a novel closed-loop system that bridges data generation, model training, and evaluation. Within each loop, the MLLM-DataEngine first analyze the weakness of the model based on the evaluation results. For targeting, we propose an Adaptive Bad-case Sampling module, which adjusts the ratio of different types of data. For quality, we resort to GPT-4 to generate high-quality data with each given data type.
arXiv Detail & Related papers (2023-08-25T01:41:04Z)
From Quantity to Quality: Boosting LLM Performance with Self-Guided Data Selection for Instruction Tuning [52.257422715393574]
We introduce a self-guided methodology for Large Language Models (LLMs) to autonomously discern and select cherry samples from open-source datasets. Our key innovation, the Instruction-Following Difficulty (IFD) metric, emerges as a pivotal metric to identify discrepancies between a model's expected responses and its intrinsic generation capability.
arXiv Detail & Related papers (2023-08-23T09:45:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.