Confidence-Aware Sub-Structure Beam Search (CABS): Mitigating Hallucination in Structured Data Generation with Large Language Models
- URL: http://arxiv.org/abs/2406.00069v1
- Date: Thu, 30 May 2024 18:21:05 GMT
- Title: Confidence-Aware Sub-Structure Beam Search (CABS): Mitigating Hallucination in Structured Data Generation with Large Language Models
- Authors: Chengwei Wei, Kee Kiat Koo, Amir Tavanaei, Karim Bouyarmane,
- Abstract summary: Confidence estimation methods on Large Language Models (LLMs) primarily focus on the confidence at the individual token level or the entire output sequence level.
We propose Confidence-Aware sub-structure Beam Search (CABS), a novel decoding method operating at the sub-structure level in structured data generation.
Results show that CABS outperforms traditional token-level beam search for structured data generation by 16.7% Recall at 90% precision averagely on the problem of product attribute generation.
- Score: 6.099774114286838
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) have facilitated structured data generation, with applications in domains like tabular data, document databases, product catalogs, etc. However, concerns persist about generation veracity due to incorrect references or hallucinations, necessitating the incorporation of some form of model confidence for mitigation. Existing confidence estimation methods on LLM generations primarily focus on the confidence at the individual token level or the entire output sequence level, limiting their applicability to structured data generation, which consists of an intricate mix of both independent and correlated entries at the sub-structure level. In this paper, we first investigate confidence estimation methods for generated sub-structure-level data. We introduce the concept of Confidence Network that applies on the hidden state of the LLM transformer, as a more targeted estimate than the traditional token conditional probability. We further propose Confidence-Aware sub-structure Beam Search (CABS), a novel decoding method operating at the sub-structure level in structured data generation. CABS enhances the faithfulness of structured data generation by considering confidence scores from the Confidence Network for each sub-structure-level data and iteratively refining the prompts. Results show that CABS outperforms traditional token-level beam search for structured data generation by 16.7% Recall at 90% precision averagely on the problem of product attribute generation.
Related papers
- StructSynth: Leveraging LLMs for Structure-Aware Tabular Data Synthesis in Low-Data Regimes [15.476662936746989]
Struct Synth is a novel framework that integrates the generative power of Large Language Models with robust structural control.<n>It produces synthetic data with significantly higher structural integrity and downstream utility than state-of-the-art methods.<n>It proves especially effective in challenging low-data scenarios, successfully navigating the trade-off between privacy preservation and statistical fidelity.
arXiv Detail & Related papers (2025-08-04T16:55:02Z) - Innovative tokenisation of structured data for LLM training [0.0]
This paper introduces a novel, hybrid tokenisation methodology to convert structured data into a sequential format suitable for training Large Language Models (LLMs)<n>We show that our method is highly efficient, processing over 31 million network flows in under five hours and achieving a significant data compression ratio of 6.18:1.<n>This process resulted in a computationally manageable corpus of over one billion tokens, establishing a viable and generalisable pathway for training foundation models on structured data.
arXiv Detail & Related papers (2025-08-03T09:29:50Z) - A Context-Aware Dual-Metric Framework for Confidence Estimation in Large Language Models [6.62851757612838]
Current confidence estimation methods for large language models (LLMs) neglect the relevance between responses and contextual information.<n>We propose CRUX, which integrates context faithfulness and consistency for confidence estimation via two novel metrics.<n> Experiments across three benchmark datasets demonstrate CRUX's effectiveness, achieving the highest AUROC than existing baselines.
arXiv Detail & Related papers (2025-08-01T12:58:34Z) - AlphaFold Database Debiasing for Robust Inverse Folding [58.792020809180336]
We introduce a Debiasing Structure AutoEncoder (DeSAE) that learns to reconstruct native-like conformations from intentionally corrupted backbone geometries.<n>At inference, applying DeSAE to AFDB structures produces debiased structures that significantly improve inverse folding performance.
arXiv Detail & Related papers (2025-06-10T02:25:31Z) - Evaluating Generative Models for Tabular Data: Novel Metrics and Benchmarking [11.03600500716845]
Existing evaluation metrics offer only partial insights, lacking a comprehensive measure of generative performance.
We propose three novel evaluation metrics: FAED, FPCAD, and RFIS.
Our results demonstrate that FAED effectively captures generative modeling issues overlooked by existing metrics.
arXiv Detail & Related papers (2025-04-29T16:16:51Z) - Structural and Statistical Texture Knowledge Distillation and Learning for Segmentation [70.15341084443236]
We re-emphasize the low-level texture information in deep networks for semantic segmentation and related knowledge distillation tasks.
We propose a novel Structural and Statistical Texture Knowledge Distillation (SSTKD) framework for semantic segmentation.
Specifically, Contourlet Decomposition Module (CDM) is introduced to decompose the low-level features.
Texture Intensity Equalization Module (TIEM) is designed to extract and enhance the statistical texture knowledge.
arXiv Detail & Related papers (2025-03-11T04:49:25Z) - StructTest: Benchmarking LLMs' Reasoning through Compositional Structured Outputs [78.84060166851805]
StructTest is a novel benchmark that evaluates large language models (LLMs) on their ability to follow compositional instructions and generate structured outputs.
Assessments are conducted deterministically using a rule-based evaluator, which can be easily extended to new tasks and datasets.
We demonstrate that StructTest remains challenging even for top-performing models like Deepseek-V3/R1 and GPT-4o.
arXiv Detail & Related papers (2024-12-23T22:08:40Z) - ORIGAMI: A generative transformer architecture for predictions from semi-structured data [3.5639148953570836]
ORIGAMI is a transformer-based architecture that processes nested key/value pairs.
By reformulating classification as next-token prediction, ORIGAMI naturally handles both single-label and multi-label tasks.
arXiv Detail & Related papers (2024-12-23T07:21:17Z) - Domain Specific Data Distillation and Multi-modal Embedding Generation [0.0]
The challenge of creating domain-centric embeddings arises from the abundance of unstructured data and the scarcity of domain-specific structured data.
This paper introduces a novel modeling approach that leverages structured data to filter noise from unstructured data, resulting in embeddings with high precision and recall for domain-specific attribute prediction.
arXiv Detail & Related papers (2024-10-27T03:47:46Z) - Bridging Textual and Tabular Worlds for Fact Verification: A Lightweight, Attention-Based Model [34.1224836768324]
FEVEROUS is a benchmark and research initiative focused on fact extraction and verification tasks.
This paper introduces a simple yet powerful model that nullifies the need for modality conversion.
Our approach efficiently exploits latent connections between different data types, thereby yielding comprehensive and reliable verdict predictions.
arXiv Detail & Related papers (2024-03-26T03:54:25Z) - Structured Language Generation Model for Robust Structure Prediction [6.4736137270915215]
We propose a framework that reduces sequence-to-sequence problems to classification problems via methodologies in loss calibration and decoding method.
Our experimental results show that SLGM is able to maintain performance without explicit dataset information, follow and potentially replace dataset-specific fine-tuning.
arXiv Detail & Related papers (2024-02-14T06:33:22Z) - Benchmarking and Analyzing Generative Data for Visual Recognition [66.55174903469722]
This work delves into the impact of generative images, primarily comparing paradigms that harness external data.
We devise textbfGenBench, a benchmark comprising 22 datasets with 2548 categories, to appraise generative data across various visual recognition tasks.
Our exhaustive benchmark and analysis spotlight generative data's promise in visual recognition, while identifying key challenges for future investigation.
arXiv Detail & Related papers (2023-07-25T17:59:59Z) - Geometric Deep Learning for Structure-Based Drug Design: A Survey [83.87489798671155]
Structure-based drug design (SBDD) leverages the three-dimensional geometry of proteins to identify potential drug candidates.
Recent advancements in geometric deep learning, which effectively integrate and process 3D geometric data, have significantly propelled the field forward.
arXiv Detail & Related papers (2023-06-20T14:21:58Z) - On Certified Generalization in Structured Prediction [1.0152838128195467]
In structured prediction, target objects have rich internal structure which does not factorize into independent components.
We present a novel PAC-Bayesian risk bound for structured prediction wherein the rate of generalization scales not only with the number of structured examples but also with their size.
arXiv Detail & Related papers (2023-06-15T13:15:26Z) - Disentanglement via Latent Quantization [60.37109712033694]
In this work, we construct an inductive bias towards encoding to and decoding from an organized latent space.
We demonstrate the broad applicability of this approach by adding it to both basic data-re (vanilla autoencoder) and latent-reconstructing (InfoGAN) generative models.
arXiv Detail & Related papers (2023-05-28T06:30:29Z) - A Confidence-based Partial Label Learning Model for Crowd-Annotated
Named Entity Recognition [74.79785063365289]
Existing models for named entity recognition (NER) are mainly based on large-scale labeled datasets.
We propose a Confidence-based Partial Label Learning (CPLL) method to integrate the prior confidence (given by annotators) and posterior confidences (learned by models) for crowd-annotated NER.
arXiv Detail & Related papers (2023-05-21T15:31:23Z) - Schema-aware Reference as Prompt Improves Data-Efficient Knowledge Graph
Construction [57.854498238624366]
We propose a retrieval-augmented approach, which retrieves schema-aware Reference As Prompt (RAP) for data-efficient knowledge graph construction.
RAP can dynamically leverage schema and knowledge inherited from human-annotated and weak-supervised data as a prompt for each sample.
arXiv Detail & Related papers (2022-10-19T16:40:28Z) - Model Criticism for Long-Form Text Generation [113.13900836015122]
We apply a statistical tool, model criticism in latent space, to evaluate the high-level structure of generated text.
We perform experiments on three representative aspects of high-level discourse -- coherence, coreference, and topicality.
We find that transformer-based language models are able to capture topical structures but have a harder time maintaining structural coherence or modeling coreference.
arXiv Detail & Related papers (2022-10-16T04:35:58Z) - HYCEDIS: HYbrid Confidence Engine for Deep Document Intelligence System [16.542137414609602]
We propose a complete and novel architecture to measure confidence of current deep learning models in document information extraction task.
Our architecture consists of a Multi-modal Conformal Predictor and a Variational Cluster-oriented Anomaly Detector.
We evaluate our architecture on real-wold datasets, not only outperforming competing confidence estimators by a huge margin but also demonstrating generalization ability to out-of-distribution data.
arXiv Detail & Related papers (2022-06-01T09:57:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.