Related papers: Selecting Better Samples from Pre-trained LLMs: A Case Study on Question Generation

Selecting Better Samples from Pre-trained LLMs: A Case Study on Question Generation

URL: http://arxiv.org/abs/2209.11000v1
Date: Thu, 22 Sep 2022 13:33:48 GMT
Title: Selecting Better Samples from Pre-trained LLMs: A Case Study on Question Generation
Authors: Xingdi Yuan, Tong Wang, Yen-Hsiang Wang, Emery Fine, Rania Abdelghani, Pauline Lucas, H\'el\`ene Sauz\'eon and Pierre-Yves Oudeyer
Abstract summary: Large Language Models (LLMs) have in recent years demonstrated impressive prowess in natural language generation. We propose two prompt-based approaches to selecting high-quality questions from a set of LLM-generated candidates. Our method works under the constraints of 1) a black-box (non-modifiable) question generation model and 2) lack of access to human-annotated references.
Score: 22.294762359009052
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) have in recent years demonstrated impressive prowess in natural language generation. A common practice to improve generation diversity is to sample multiple outputs from the model. However, there lacks a simple and robust way of selecting the best output from these stochastic samples. As a case study framed in the context of question generation, we propose two prompt-based approaches to selecting high-quality questions from a set of LLM-generated candidates. Our method works under the constraints of 1) a black-box (non-modifiable) question generation model and 2) lack of access to human-annotated references -- both of which are realistic limitations for real-world deployment of LLMs. With automatic as well as human evaluations, we empirically demonstrate that our approach can effectively select questions of higher qualities than greedy generation.

Related papers

Learning Generative Selection for Best-of-N [52.88943295436412]
We show that small reasoning models can acquire strong GenSelect capabilities through targeted reinforcement learning.<n>Our results establish reinforcement learning as a scalable way to unlock strong generative selection in small models.
arXiv Detail & Related papers (2026-02-02T14:21:15Z)
Generate-Then-Validate: A Novel Question Generation Approach Using Small Language Models [0.8602553195689513]
We present a novel question generation pipeline that leverages the text generation and the probabilistic reasoning abilities of SLMs to generate high-quality questions.<n>Our findings suggest that an SLM can effectively generate high-quality questions when guided by a well-designed pipeline.
arXiv Detail & Related papers (2025-12-10T21:59:36Z)
Lemma Dilemma: On Lemma Generation Without Domain- or Language-Specific Training Data [18.87770758217633]
Lemmatization is the task of transforming all words in a given text to their dictionary forms.<n>There is no prior evidence of how effective large language models are in the contextual lemmatization task.<n>This paper empirically investigates the capacity of the latest generation of LLMs to perform in-context lemmatization.
arXiv Detail & Related papers (2025-10-08T18:34:00Z)
Uncertainty-Aware Answer Selection for Improved Reasoning in Multi-LLM Systems [55.6590601898194]
Large Language Models (LLMs) have demonstrated exceptional capabilities, yet selecting the most reliable response from multiple LLMs remains a challenge.<n>Existing approaches often depend on costly external verifiers, human evaluators, or self-consistency techniques that require multiple samples from a single model.<n>We propose a principled, novel and computationally efficient method to select the best response from multiple different LLMs using a calibrated log-likelihood score.
arXiv Detail & Related papers (2025-09-30T01:25:19Z)
Learning to Refine: Self-Refinement of Parallel Reasoning in LLMs [102.48588475875749]
We introduce Generative Self-Refinement (GSR), a novel parallel test-time scaling framework.<n>GSR generates a set of candidate responses in parallel and then performs self-refinement to synthesize a new superior solution.<n>We show that our method achieves state-of-the-art performance across five mathematical benchmarks.
arXiv Detail & Related papers (2025-08-27T06:51:48Z)
LLMs Can Generate a Better Answer by Aggregating Their Own Responses [83.69632759174405]
Large Language Models (LLMs) have shown remarkable capabilities across tasks, yet they often require additional prompting techniques when facing complex problems. We argue this limitation stems from the fact that common LLM post-training procedures lack explicit supervision for discriminative judgment tasks. We propose Generative Self-Aggregation (GSA), a novel prompting method that improves answer quality without requiring the model's discriminative capabilities.
arXiv Detail & Related papers (2025-03-06T05:25:43Z)
Large Language Models Can Self-Improve in Long-context Reasoning [100.52886241070907]
Large language models (LLMs) have achieved substantial progress in processing long contexts but still struggle with long-context reasoning. We propose ours, an approach specifically designed for this purpose. ours achieves superior performance compared to prior approaches that depend on data produced by human experts or advanced models.
arXiv Detail & Related papers (2024-11-12T19:53:00Z)
DisGeM: Distractor Generation for Multiple Choice Questions with Span Masking [0.0]
We present a generic framework for distractor generation for multiple-choice questions (MCQ) Our framework relies solely on pre-trained language models and does not require additional training on specific datasets. Human evaluations confirm that our approach produces more effective and engaging distractors.
arXiv Detail & Related papers (2024-09-26T20:15:46Z)
Experimental Design for Active Transductive Inference in Large Language Models [18.2671641610825]
We use active learning for adaptive prompt design and call it Active In-context Prompt Design (AIPD) We design the LLM prompt by adaptively choosing few-shot examples from a training set to optimize performance on a test set. We propose two algorithms, GO and SAL, which differ in how the few-shot examples are chosen.
arXiv Detail & Related papers (2024-04-12T23:27:46Z)
LUQ: Long-text Uncertainty Quantification for LLMs [29.987010627250527]
Large Language Models (LLMs) are prone to generate nonfactual content. Uncertainty Quantification (UQ) is pivotal in enhancing our understanding of a model's confidence on its generation. We propose textscLuq-Ensemble, a method that ensembles responses from multiple models and selects the response with the lowest uncertainty.
arXiv Detail & Related papers (2024-03-29T16:49:24Z)
YAYI 2: Multilingual Open-Source Large Language Models [53.92832054643197]
We propose YAYI 2, including both base and chat models, with 30 billion parameters. YAYI 2 is pre-trained from scratch on a multilingual corpus which contains 2.65 trillion tokens filtered by our pre-training data processing pipeline. The base model is aligned with human values through supervised fine-tuning with millions of instructions and reinforcement learning from human feedback.
arXiv Detail & Related papers (2023-12-22T17:34:47Z)
Large Language Model-Aware In-Context Learning for Code Generation [75.68709482932903]
Large language models (LLMs) have shown impressive in-context learning (ICL) ability in code generation. We propose a novel learning-based selection approach named LAIL (LLM-Aware In-context Learning) for code generation.
arXiv Detail & Related papers (2023-10-15T06:12:58Z)
Reranking for Natural Language Generation from Logical Forms: A Study based on Large Language Models [47.08364281023261]
Large language models (LLMs) have demonstrated impressive capabilities in natural language generation. However, their output quality can be inconsistent, posing challenges for generating natural language from logical forms (LFs)
arXiv Detail & Related papers (2023-09-21T17:54:58Z)
Large Language Models Are Not Robust Multiple Choice Selectors [117.72712117510953]
Multiple choice questions (MCQs) serve as a common yet important task format in the evaluation of large language models (LLMs) This work shows that modern LLMs are vulnerable to option position changes due to their inherent "selection bias" We propose a label-free, inference-time debiasing method, called PriDe, which separates the model's prior bias for option IDs from the overall prediction distribution.
arXiv Detail & Related papers (2023-09-07T17:44:56Z)
True Few-Shot Learning with Language Models [78.42578316883271]
We evaluate the few-shot ability of LMs when held-out examples are unavailable. Our findings suggest that prior work significantly overestimated the true few-shot ability of LMs.
arXiv Detail & Related papers (2021-05-24T17:55:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.