ASPIRO: Any-shot Structured Parsing-error-Induced ReprOmpting for
Consistent Data-to-Text Generation
- URL: http://arxiv.org/abs/2310.17877v1
- Date: Fri, 27 Oct 2023 03:39:51 GMT
- Title: ASPIRO: Any-shot Structured Parsing-error-Induced ReprOmpting for
Consistent Data-to-Text Generation
- Authors: Martin Vejvar and Yasutaka Fujimoto
- Abstract summary: ASPIRO is an approach for structured data verbalisation into short template sentences in zero to few-shot settings.
Unlike previous methods, our approach prompts large language models to directly produce entity-agnostic templates.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present ASPIRO, an approach for structured data verbalisation into short
template sentences in zero to few-shot settings. Unlike previous methods, our
approach prompts large language models (LLMs) to directly produce
entity-agnostic templates, rather than relying on LLMs to faithfully copy the
given example entities, or validating/crafting the templates manually. We
incorporate LLM re-prompting, triggered by algorithmic parsing checks, as well
as the PARENT metric induced consistency validation to identify and rectify
template generation problems in real-time. ASPIRO, compared to direct LLM
output, averages 66\% parsing error rate reduction in generated verbalisations
of RDF triples on the DART dataset. Our best 5-shot text-davinci-003 setup,
scoring BLEU of 50.62, METEOR of 45.16, BLEURT of 0.82, NUBIA of 0.87, and
PARENT of 0.8962 on the Rel2Text dataset, competes effectively with recent
fine-tuned pre-trained language models.
Related papers
- A document processing pipeline for the construction of a dataset for topic modeling based on the judgments of the Italian Supreme Court [5.612141846711729]
We develop a document processing pipeline that produces an anonymized dataset optimized for topic modeling.<n>The pipeline integrates document layout analysis (YOLOv8x), optical character recognition, and text anonymization.<n>Compared to OCR-only methods, our dataset improved topic modeling with a diversity score of 0.6198 and a coherence score of 0.6638.
arXiv Detail & Related papers (2025-05-13T11:06:24Z) - Tgea: An error-annotated dataset and benchmark tasks for text generation from pretrained language models [57.758735361535486]
TGEA is an error-annotated dataset for text generation from pretrained language models (PLMs)
We create an error taxonomy to cover 24 types of errors occurring in PLM-generated sentences.
This is the first dataset with comprehensive annotations for PLM-generated texts.
arXiv Detail & Related papers (2025-03-06T09:14:02Z) - Idiosyncrasies in Large Language Models [54.26923012617675]
We unveil and study idiosyncrasies in Large Language Models (LLMs)
We find that fine-tuning existing text embedding models on LLM-generated texts yields excellent classification accuracy.
We leverage LLM as judges to generate detailed, open-ended descriptions of each model's idiosyncrasies.
arXiv Detail & Related papers (2025-02-17T18:59:02Z) - Traceable LLM-based validation of statements in knowledge graphs [0.0]
This article presents a method for verifying RDF triples using LLMs.<n>Because the LLMs cannot currently reliably identify the origin of the information used to construct the response to the user prompt, our approach is to avoid using internal LLM factual knowledge altogether.<n>Instead, verified RDF statements are compared to chunks of external documents retrieved through a web search or Wikipedia.
arXiv Detail & Related papers (2024-09-11T12:27:41Z) - Detecting Errors through Ensembling Prompts (DEEP): An End-to-End LLM Framework for Detecting Factual Errors [11.07539342949602]
We propose an end-to-end framework for detecting factual errors in text summarization.
Our framework uses a diverse set of LLM prompts to identify factual inconsistencies.
We calibrate the ensembled models to produce empirically accurate probabilities that a text is factually consistent or free of hallucination.
arXiv Detail & Related papers (2024-06-18T18:59:37Z) - CaLM: Contrasting Large and Small Language Models to Verify Grounded Generation [76.31621715032558]
Grounded generation aims to equip language models (LMs) with the ability to produce more credible and accountable responses.
We introduce CaLM, a novel verification framework.
Our framework empowers smaller LMs, which rely less on parametric memory, to validate the output of larger LMs.
arXiv Detail & Related papers (2024-06-08T06:04:55Z) - Who Wrote This? The Key to Zero-Shot LLM-Generated Text Detection Is GECScore [51.65730053591696]
We propose a simple but effective black-box zero-shot detection approach.
It is predicated on the observation that human-written texts typically contain more grammatical errors than LLM-generated texts.
Our method achieves an average AUROC of 98.7% and shows strong robustness against paraphrase and adversarial perturbation attacks.
arXiv Detail & Related papers (2024-05-07T12:57:01Z) - Beyond Traditional Benchmarks: Analyzing Behaviors of Open LLMs on Data-to-Text Generation [0.0]
We analyze the behaviors of open large language models (LLMs) on the task of data-to-text (D2T) generation.
We find that open LLMs can generate fluent and coherent texts in zero-shot settings from data in common formats collected with Quintd.
arXiv Detail & Related papers (2024-01-18T18:15:46Z) - Revisiting Large Language Models as Zero-shot Relation Extractors [8.953462875381888]
Relation extraction (RE) consistently involves a certain degree of labeled or unlabeled data even if under zero-shot setting.
Recent studies have shown that large language models (LLMs) transfer well to new tasks out-of-the-box simply given a natural language prompt.
This work focuses on the study of exploring LLMs as zero-shot relation extractors.
arXiv Detail & Related papers (2023-10-08T06:17:39Z) - BooookScore: A systematic exploration of book-length summarization in the era of LLMs [53.42917858142565]
We develop an automatic metric, BooookScore, that measures the proportion of sentences in a summary that do not contain any of the identified error types.
We find that closed-source LLMs such as GPT-4 and 2 produce summaries with higher BooookScore than those generated by open-source models.
arXiv Detail & Related papers (2023-10-01T20:46:44Z) - Annotating and Detecting Fine-grained Factual Errors for Dialogue
Summarization [34.85353544844499]
We present the first dataset with fine-grained factual error annotations named DIASUMFACT.
We define fine-grained factual error detection as a sentence-level multi-label classification problem.
We propose an unsupervised model ENDERANKER via candidate ranking using pretrained encoder-decoder models.
arXiv Detail & Related papers (2023-05-26T00:18:33Z) - LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond [135.8013388183257]
We propose a new protocol for inconsistency detection benchmark creation and implement it in a 10-domain benchmark called SummEdits.
Most LLMs struggle on SummEdits, with performance close to random chance.
The best-performing model, GPT-4, is still 8% below estimated human performance.
arXiv Detail & Related papers (2023-05-23T21:50:06Z) - Guiding Large Language Models via Directional Stimulus Prompting [114.84930073977672]
We introduce Directional Stimulus Prompting, a novel framework for guiding black-box large language models (LLMs) toward specific desired outputs.
Instead of directly adjusting LLMs, our method employs a small tunable policy model to generate an auxiliary directional stimulus prompt for each input instance.
arXiv Detail & Related papers (2023-02-22T17:44:15Z) - DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability
Curvature [143.5381108333212]
We show that text sampled from an large language model tends to occupy negative curvature regions of the model's log probability function.
We then define a new curvature-based criterion for judging if a passage is generated from a given LLM.
We find DetectGPT is more discriminative than existing zero-shot methods for model sample detection.
arXiv Detail & Related papers (2023-01-26T18:44:06Z) - IDIAPers @ Causal News Corpus 2022: Efficient Causal Relation
Identification Through a Prompt-based Few-shot Approach [3.4423596432619754]
We address the Causal Relation Identification (CRI) task by exploiting a set of simple yet complementary techniques for fine-tuning language models (LMs)
We follow a prompt-based prediction approach for fine-tuning LMs in which the CRI task is treated as a masked language modeling problem (MLM)
We compare the performance of this method against ensemble techniques trained on the entire dataset.
arXiv Detail & Related papers (2022-09-08T16:03:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.