Probabilistic Reasoning with LLMs for k-anonymity Estimation
- URL: http://arxiv.org/abs/2503.09674v1
- Date: Wed, 12 Mar 2025 17:41:25 GMT
- Title: Probabilistic Reasoning with LLMs for k-anonymity Estimation
- Authors: Jonathan Zheng, Sauvik Das, Alan Ritter, Wei Xu,
- Abstract summary: We introduce a novel numerical reasoning task under uncertainty, focusing on estimating the k-anonymity of user-generated documents containing privacy-sensitive information.<n>We propose BRANCH, which uses LLMs to factorize a joint probability distribution to estimate the k-value.<n>Our experiments show that this method successfully estimates the correct k-value 67% of the time, an 11% increase compared to GPT-4o chain-of-thought reasoning.
- Score: 23.16673184539629
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Probabilistic reasoning is a key aspect of both human and artificial intelligence that allows for handling uncertainty and ambiguity in decision-making. In this paper, we introduce a novel numerical reasoning task under uncertainty, focusing on estimating the k-anonymity of user-generated documents containing privacy-sensitive information. We propose BRANCH, which uses LLMs to factorize a joint probability distribution to estimate the k-value-the size of the population matching the given information-by modeling individual pieces of textual information as random variables. The probability of each factor occurring within a population is estimated using standalone LLMs or retrieval-augmented generation systems, and these probabilities are combined into a final k-value. Our experiments show that this method successfully estimates the correct k-value 67% of the time, an 11% increase compared to GPT-4o chain-of-thought reasoning. Additionally, we leverage LLM uncertainty to develop prediction intervals for k-anonymity, which include the correct value in nearly 92% of cases.
Related papers
- Uncertainty Decomposition and Error Margin Detection of Homodyned-K Distribution in Quantitative Ultrasound [1.912429179274357]
Homodyned K-distribution (HK-distribution) parameter estimation in quantitative ultrasound (QUS) has been recently addressed using Bayesian Neural Networks (BNNs)
BNNs have been shown to significantly reduce computational time in speckle statistics-based QUS without compromising accuracy and precision.
arXiv Detail & Related papers (2024-09-17T22:16:49Z) - Probabilistic Medical Predictions of Large Language Models [4.825666689707888]
Large Language Models (LLMs) have shown promise in clinical applications through prompt engineering.<n>LLMs struggle to produce reliable prediction probabilities, which are crucial for transparency and decision-making.<n>We compared explicit probabilities from text generation to implicit probabilities derived from the likelihood of predicting the correct label token.
arXiv Detail & Related papers (2024-08-21T03:47:17Z) - Uncertainty is Fragile: Manipulating Uncertainty in Large Language Models [79.76293901420146]
Large Language Models (LLMs) are employed across various high-stakes domains, where the reliability of their outputs is crucial.
Our research investigates the fragility of uncertainty estimation and explores potential attacks.
We demonstrate that an attacker can embed a backdoor in LLMs, which, when activated by a specific trigger in the input, manipulates the model's uncertainty without affecting the final output.
arXiv Detail & Related papers (2024-07-15T23:41:11Z) - Cycles of Thought: Measuring LLM Confidence through Stable Explanations [53.15438489398938]
Large language models (LLMs) can reach and even surpass human-level accuracy on a variety of benchmarks, but their overconfidence in incorrect responses is still a well-documented failure mode.
We propose a framework for measuring an LLM's uncertainty with respect to the distribution of generated explanations for an answer.
arXiv Detail & Related papers (2024-06-05T16:35:30Z) - BIRD: A Trustworthy Bayesian Inference Framework for Large Language Models [52.46248487458641]
Predictive models often need to work with incomplete information in real-world tasks.
Current large language models (LLM) are insufficient for such accurate estimations.
We propose BIRD, a novel probabilistic inference framework.
arXiv Detail & Related papers (2024-04-18T20:17:23Z) - Decomposing Uncertainty for Large Language Models through Input Clarification Ensembling [69.83976050879318]
In large language models (LLMs), identifying sources of uncertainty is an important step toward improving reliability, trustworthiness, and interpretability.
In this paper, we introduce an uncertainty decomposition framework for LLMs, called input clarification ensembling.
Our approach generates a set of clarifications for the input, feeds them into an LLM, and ensembles the corresponding predictions.
arXiv Detail & Related papers (2023-11-15T05:58:35Z) - Quantification of Predictive Uncertainty via Inference-Time Sampling [57.749601811982096]
We propose a post-hoc sampling strategy for estimating predictive uncertainty accounting for data ambiguity.
The method can generate different plausible outputs for a given input and does not assume parametric forms of predictive distributions.
arXiv Detail & Related papers (2023-08-03T12:43:21Z) - Uncertainty Quantification in Extreme Learning Machine: Analytical
Developments, Variance Estimates and Confidence Intervals [0.0]
Uncertainty quantification is crucial to assess prediction quality of a machine learning model.
Most methods proposed in the literature make strong assumptions on the data, ignore the randomness of input weights or neglect the bias contribution in confidence interval estimations.
This paper presents novel estimations that overcome these constraints and improve the understanding of ELM variability.
arXiv Detail & Related papers (2020-11-03T13:45:59Z) - Orthogonal Statistical Learning [49.55515683387805]
We provide non-asymptotic excess risk guarantees for statistical learning in a setting where the population risk depends on an unknown nuisance parameter.
We show that if the population risk satisfies a condition called Neymanity, the impact of the nuisance estimation error on the excess risk bound achieved by the meta-algorithm is of second order.
arXiv Detail & Related papers (2019-01-25T02:21:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.