Related papers: Integrate the Essence and Eliminate the Dross: Fine-Grained Self-Consistency for Free-Form Language Generation

Integrate the Essence and Eliminate the Dross: Fine-Grained Self-Consistency for Free-Form Language Generation

URL: http://arxiv.org/abs/2407.02056v1
Date: Tue, 2 Jul 2024 08:38:31 GMT
Title: Integrate the Essence and Eliminate the Dross: Fine-Grained Self-Consistency for Free-Form Language Generation
Authors: Xinglin Wang, Yiwei Li, Shaoxiong Feng, Peiwen Yuan, Boyuan Pan, Heda Wang, Yao Hu, Kan Li,
Abstract summary: We propose Fine-Grained Self-Consistency (FSC) to optimize output quality by effectively fine-grained consensus knowledge from multiple samples. The effectiveness of FSC is demonstrated through extensive experiments on various tasks, including summarization, code generation, and mathematical reasoning.
Score: 20.138831477848615
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Self-consistency (SC), leveraging multiple samples from LLMs, shows significant gains on various reasoning tasks but struggles with free-form generation due to the difficulty of aggregating answers. Its variants, UCS and USC, rely on sample selection or voting mechanisms to improve output quality. These methods, however, face limitations due to their inability to fully utilize the nuanced consensus knowledge present within multiple candidate samples, often resulting in suboptimal outputs. We propose Fine-Grained Self-Consistency (FSC) to addresses these limitations by extracting and integrating segment-level commonalities from candidate samples, enhancing the performance of LLMs both in open-ended and reasoning tasks. Based on this, we present two additional strategies: candidate filtering, which enhances overall quality by identifying highly similar candidate sets, and merging, which reduces input token requirements by combining similar samples. The effectiveness of FSC is demonstrated through extensive experiments on various tasks, including summarization, code generation, and mathematical reasoning, using GPT-3.5-turbo and GPT-4. The results indicate significant improvements over baseline methods, showcasing the potential of FSC to optimize output quality by effectively synthesizing fine-grained consensus knowledge from multiple samples.

Related papers

VADE: Variance-Aware Dynamic Sampling via Online Sample-Level Difficulty Estimation for Multimodal RL [38.782188833641676]
Group-based policy optimization methods like GRPO and GSPO have become standard for training multimodal models.<n>They suffer from a critical emphgradient vanishing problem when all responses within a group receive identical rewards.<n>We propose textbfVADE, a sampling framework via online sample-level difficulty textbfEstimation.
arXiv Detail & Related papers (2025-11-24T08:59:54Z)
Learning to Refine: Self-Refinement of Parallel Reasoning in LLMs [102.48588475875749]
We introduce Generative Self-Refinement (GSR), a novel parallel test-time scaling framework.<n>GSR generates a set of candidate responses in parallel and then performs self-refinement to synthesize a new superior solution.<n>We show that our method achieves state-of-the-art performance across five mathematical benchmarks.
arXiv Detail & Related papers (2025-08-27T06:51:48Z)
Diffusion Sampling Path Tells More: An Efficient Plug-and-Play Strategy for Sample Filtering [18.543769006014383]
Diffusion models often exhibit inconsistent sample quality due to variations inherent in their sampling trajectories.<n>We introduce CFG-Rejection, an efficient, plug-and-play strategy that filters low-quality samples at an early stage of the denoising process.<n>We validate the effectiveness of CFG-Rejection in image generation through extensive experiments.
arXiv Detail & Related papers (2025-05-29T11:08:24Z)
Review, Refine, Repeat: Understanding Iterative Decoding of AI Agents with Dynamic Evaluation and Selection [71.92083784393418]
Inference-time methods such as Best-of-N (BON) sampling offer a simple yet effective alternative to improve performance. We propose Iterative Agent Decoding (IAD) which combines iterative refinement with dynamic candidate evaluation and selection guided by a verifier.
arXiv Detail & Related papers (2025-04-02T17:40:47Z)
Add-One-In: Incremental Sample Selection for Large Language Models via a Choice-Based Greedy Paradigm [41.4789135538612]
This paper introduces a novel choice-based sample selection framework that shifts the focus from evaluating individual sample quality to comparing the contribution value of different samples. Thanks to the advanced language understanding capabilities of Large Language Models (LLMs), we utilize LLMs to evaluate the value of each option during the selection process.
arXiv Detail & Related papers (2025-03-04T07:32:41Z)
ESANS: Effective and Semantic-Aware Negative Sampling for Large-Scale Retrieval Systems [7.897183317096681]
In the retrieval stage, classic embedding-based retrieval methods depend on effective negative sampling techniques to enhance both performance and efficiency. We propose Effective and Semantic-Aware Negative Sampling (ESANS), which integrates two key components: Effective Dense Interpolation Strategy (EDIS) and Multimodal Semantic-Aware Clustering (MSAC)
arXiv Detail & Related papers (2025-02-22T04:43:20Z)
Scoring Verifiers: Evaluating Synthetic Verification for Code and Reasoning [59.25951947621526]
We propose an approach which can transform existing coding benchmarks into scoring and ranking datasets to evaluate the effectiveness of synthetic verifiers. We release four new benchmarks (HE-R, HE-R+, MBPP-R, and MBPP-R+), and analyzed synthetic verification methods with standard, reasoning-based, and reward-based LLMs. Our experiments show that reasoning can significantly improve test case generation and that scaling the number of test cases enhances the verification accuracy.
arXiv Detail & Related papers (2025-02-19T15:32:11Z)
Diversified Sampling Improves Scaling LLM inference [31.18762591875725]
DivSampling is a novel and versatile sampling technique designed to enhance the diversity of candidate solutions. Our theoretical analysis demonstrates that, under mild assumptions, the error rates of responses generated from diverse prompts are significantly lower compared to those produced by stationary prompts.
arXiv Detail & Related papers (2025-02-16T07:37:58Z)
A Systematic Examination of Preference Learning through the Lens of Instruction-Following [83.71180850955679]
We use a novel synthetic data generation pipeline to generate 48,000 instruction unique-following prompts. With our synthetic prompts, we use two preference dataset curation methods - rejection sampling (RS) and Monte Carlo Tree Search (MCTS) Experiments reveal that shared prefixes in preference pairs, as generated by MCTS, provide marginal but consistent improvements. High-contrast preference pairs generally outperform low-contrast pairs; however, combining both often yields the best performance.
arXiv Detail & Related papers (2024-12-18T15:38:39Z)
Words Matter: Leveraging Individual Text Embeddings for Code Generation in CLIP Test-Time Adaptation [21.20806568508201]
We show how to leverage class text information to mitigate distribution drifts encountered by vision-language models (VLMs) during test-time inference. We propose to generate pseudo-labels for the test-time samples by exploiting generic class text embeddings as fixed centroids of a label assignment problem. Experiments on multiple popular test-time adaptation benchmarks presenting diverse complexity empirically show the superiority of CLIP-OT.
arXiv Detail & Related papers (2024-11-26T00:15:37Z)
THaMES: An End-to-End Tool for Hallucination Mitigation and Evaluation in Large Language Models [0.0]
Hallucination, the generation of factually incorrect content, is a growing challenge in Large Language Models. This paper introduces THaMES, an integrated framework and library addressing this gap. THaMES offers an end-to-end solution for evaluating and mitigating hallucinations in LLMs.
arXiv Detail & Related papers (2024-09-17T16:55:25Z)
Dynamic Self-Consistency: Leveraging Reasoning Paths for Efficient LLM Sampling [9.44858963874474]
Self-Consistency (SC) results in significant computational costs proportional to the number of samples generated. We propose Reasoning-Aware Self-Consistency (RASC), an innovative early-stopping framework that adjusts the number of sample generations. RASC significantly reduces sample usage by an average of 80% while maintaining or improving accuracy up to 5% compared to the original SC.
arXiv Detail & Related papers (2024-08-30T05:14:59Z)
PEDAL: Enhancing Greedy Decoding with Large Language Models using Diverse Exemplars [1.450405446885067]
Self-ensembling techniques with diverse reasoning paths have demonstrated remarkable performance gains in text generation with Large Language Models (LLMs) We introduce PEDAL, a hybrid self-ensembling approach that combines the strengths of diverse exemplar based prompts and LLM based aggregation to achieve improvement in overall performance.
arXiv Detail & Related papers (2024-08-16T17:54:09Z)
Universal Self-Consistency for Large Language Model Generation [72.6761480346095]
Self-consistency with chain-of-thought prompting (CoT) has demonstrated remarkable performance gains on challenging tasks. We propose Universal Self-Consistency (USC), which leverages large language models (LLMs) to select the most consistent answer.
arXiv Detail & Related papers (2023-11-29T02:07:09Z)
Enhancing Large Language Models in Coding Through Multi-Perspective Self-Consistency [127.97467912117652]
Large language models (LLMs) have exhibited remarkable ability in code generation. However, generating the correct solution in a single attempt still remains a challenge. We propose the Multi-Perspective Self-Consistency (MPSC) framework incorporating both inter- and intra-consistency.
arXiv Detail & Related papers (2023-09-29T14:23:26Z)
Tackling Diverse Minorities in Imbalanced Classification [80.78227787608714]
Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers. We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes. We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
arXiv Detail & Related papers (2023-08-28T18:48:34Z)
Differentially Private Federated Clustering over Non-IID Data [59.611244450530315]
clustering clusters (FedC) problem aims to accurately partition unlabeled data samples distributed over massive clients into finite clients under the orchestration of a server. We propose a novel FedC algorithm using differential privacy convergence technique, referred to as DP-Fed, in which partial participation and multiple clients are also considered. Various attributes of the proposed DP-Fed are obtained through theoretical analyses of privacy protection, especially for the case of non-identically and independently distributed (non-i.i.d.) data.
arXiv Detail & Related papers (2023-01-03T05:38:43Z)
Fed-CBS: A Heterogeneity-Aware Client Sampling Mechanism for Federated Learning via Class-Imbalance Reduction [76.26710990597498]
We show that the class-imbalance of the grouped data from randomly selected clients can lead to significant performance degradation. Based on our key observation, we design an efficient client sampling mechanism, i.e., Federated Class-balanced Sampling (Fed-CBS) In particular, we propose a measure of class-imbalance and then employ homomorphic encryption to derive this measure in a privacy-preserving way.
arXiv Detail & Related papers (2022-09-30T05:42:56Z)
Towards Automated Imbalanced Learning with Deep Hierarchical Reinforcement Learning [57.163525407022966]
Imbalanced learning is a fundamental challenge in data mining, where there is a disproportionate ratio of training samples in each class. Over-sampling is an effective technique to tackle imbalanced learning through generating synthetic samples for the minority class. We propose AutoSMOTE, an automated over-sampling algorithm that can jointly optimize different levels of decisions.
arXiv Detail & Related papers (2022-08-26T04:28:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.