Related papers: Mind the data gap: Missingness Still Shapes Large Language Model Prognoses

Mind the data gap: Missingness Still Shapes Large Language Model Prognoses

URL: http://arxiv.org/abs/2512.00479v1
Date: Sat, 29 Nov 2025 13:24:07 GMT
Title: Mind the data gap: Missingness Still Shapes Large Language Model Prognoses
Authors: Yuta Kobayashi, Vincent Jeanselme, Shalmali Joshi,
Abstract summary: Despite extensive literature on the informativeness of missingness, its implications on the performance of Large Language Models have not been studied.<n>We demonstrate that patterns of missingness significantly impact zero-shot predictive performance.<n>We conclude that there is a need for more transparent accounting and systematic evaluation of the impact of representing (informative) missingness on downstream performance.
Score: 4.263225092704034
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Data collection often reflects human decisions. In healthcare, for instance, a referral for a diagnostic test is influenced by the patient's health, their preferences, available resources, and the practitioner's recommendations. Despite the extensive literature on the informativeness of missingness, its implications on the performance of Large Language Models (LLMs) have not been studied. Through a series of experiments on data from Columbia University Medical Center, a large urban academic medical center, and MIMIC-IV, we demonstrate that patterns of missingness significantly impact zero-shot predictive performance. Notably, the explicit inclusion of missingness indicators at prompting benefits some while hurting other LLMs' zero-shot predictive performance and calibration, suggesting an inconsistent impact. The proposed aggregated analysis and theoretical insights suggest that larger models benefit from these interventions, while smaller models can be negatively impacted. The LLM paradigm risks obscuring the impact of missingness, often neglected even in conventional ML, even further. We conclude that there is a need for more transparent accounting and systematic evaluation of the impact of representing (informative) missingness on downstream performance.

Related papers

Spurious Correlations and Beyond: Understanding and Mitigating Shortcut Learning in SDOH Extraction with Large Language Models [3.3408746880885003]
Large language models (LLMs) have shown promise, but they may rely on superficial cues leading to spurious predictions.<n>We demonstrate that mentions of alcohol or smoking can falsely induce models to predict current/past drug use where none is present.<n>We evaluate mitigation strategies - such as prompt engineering and chain-of-thought reasoning - to reduce these false positives.
arXiv Detail & Related papers (2025-05-30T18:11:33Z)
Exploring Criteria of Loss Reweighting to Enhance LLM Unlearning [66.8042627609456]
Loss reweighting has shown significant benefits for machine unlearning with large language models (LLMs)<n>In this paper, we identify two distinct goals of loss reweighting, namely, Saturation and Importance.<n>We propose SatImp, a simple reweighting method that combines the advantages of both saturation and importance.
arXiv Detail & Related papers (2025-05-17T10:41:22Z)
Impact of Missing Values in Machine Learning: A Comprehensive Analysis [0.0]
This paper aims to examine the nuanced impact of missing values on machine learning (ML) models. Our analysis focuses on the challenges posed by missing values, including biased inferences, reduced predictive power, and increased computational burdens. The study employs case studies and real-world examples to illustrate the practical implications of addressing missing values.
arXiv Detail & Related papers (2024-10-10T18:31:44Z)
VALOR-EVAL: Holistic Coverage and Faithfulness Evaluation of Large Vision-Language Models [57.43276586087863]
Large Vision-Language Models (LVLMs) suffer from hallucination issues, wherein the models generate plausible-sounding but factually incorrect outputs. Existing benchmarks are often limited in scope, focusing mainly on object hallucinations. We introduce a multi-dimensional benchmark covering objects, attributes, and relations, with challenging images selected based on associative biases.
arXiv Detail & Related papers (2024-04-22T04:49:22Z)
Oversampling Higher-Performing Minorities During Machine Learning Model Training Reduces Adverse Impact Slightly but Also Reduces Model Accuracy [18.849426971487077]
We systematically under- and oversampled minority (Black and Hispanic) applicants to manipulate adverse impact ratios in training data. We found that training data adverse impact related linearly to ML model adverse impact. We observed consistent effects across self-reports and interview transcripts, whether oversampling real or synthetic observations.
arXiv Detail & Related papers (2023-04-27T02:53:29Z)
Evaluating the Fairness of Deep Learning Uncertainty Estimates in Medical Image Analysis [3.5536769591744557]
Deep learning (DL) models have shown great success in many medical image analysis tasks. However, deployment of the resulting models into real clinical contexts requires robustness and fairness across different sub-populations. Recent studies have shown significant biases in DL models across demographic subgroups, indicating a lack of fairness in the models.
arXiv Detail & Related papers (2023-03-06T16:01:30Z)
Benchmarking Heterogeneous Treatment Effect Models through the Lens of Interpretability [82.29775890542967]
Estimating personalized effects of treatments is a complex, yet pervasive problem. Recent developments in the machine learning literature on heterogeneous treatment effect estimation gave rise to many sophisticated, but opaque, tools. We use post-hoc feature importance methods to identify features that influence the model's predictions.
arXiv Detail & Related papers (2022-06-16T17:59:05Z)
Analyzing the Effects of Handling Data Imbalance on Learned Features from Medical Images by Looking Into the Models [50.537859423741644]
Training a model on an imbalanced dataset can introduce unique challenges to the learning problem. We look deeper into the internal units of neural networks to observe how handling data imbalance affects the learned features.
arXiv Detail & Related papers (2022-04-04T09:38:38Z)
Transductive image segmentation: Self-training and effect of uncertainty estimation [16.609998086075127]
Semi-supervised learning (SSL) uses unlabeled data during training to learn better models. This study focuses on the quality of predictions made on the unlabeled data of interest when they are included for optimization during training, rather than improving generalization. Our experiments on a large MRI database for multi-class segmentation of traumatic brain lesions shows promising results when comparing transductive with inductive predictions.
arXiv Detail & Related papers (2021-07-19T15:26:07Z)
On the Robustness of Pretraining and Self-Supervision for a Deep Learning-based Analysis of Diabetic Retinopathy [70.71457102672545]
We compare the impact of different training procedures for diabetic retinopathy grading. We investigate different aspects such as quantitative performance, statistics of the learned feature representations, interpretability and robustness to image distortions. Our results indicate that models from ImageNet pretraining report a significant increase in performance, generalization and robustness to image distortions.
arXiv Detail & Related papers (2021-06-25T08:32:45Z)
Efficient Causal Inference from Combined Observational and Interventional Data through Causal Reductions [68.6505592770171]
Unobserved confounding is one of the main challenges when estimating causal effects. We propose a novel causal reduction method that replaces an arbitrary number of possibly high-dimensional latent confounders. We propose a learning algorithm to estimate the parameterized reduced model jointly from observational and interventional data.
arXiv Detail & Related papers (2021-03-08T14:29:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.