DiversiGATE: A Comprehensive Framework for Reliable Large Language
Models
- URL: http://arxiv.org/abs/2306.13230v2
- Date: Mon, 26 Jun 2023 20:55:56 GMT
- Title: DiversiGATE: A Comprehensive Framework for Reliable Large Language
Models
- Authors: Shima Imani, Ali Beyram, Harsh Shrivastava
- Abstract summary: We introduce DiversiGATE, a unified framework that consolidates diverse methodologies for LLM verification.
We propose a novel SelfLearner' model that conforms to the DiversiGATE framework and refines its performance over time.
Our results demonstrate that our approach outperforms traditional LLMs, achieving a considerable 54.8% -> 61.8% improvement on the GSM8K benchmark.
- Score: 2.616506436169964
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In this paper, we introduce DiversiGATE, a unified framework that
consolidates diverse methodologies for LLM verification. The proposed framework
comprises two main components: Diversification and Aggregation which provide a
holistic perspective on existing verification approaches, such as
Self-Consistency, Math Prompter and WebGPT. Furthermore, we propose a novel
`SelfLearner' model that conforms to the DiversiGATE framework which can learn
from its own outputs and refine its performance over time, leading to improved
accuracy. To evaluate the effectiveness of SelfLearner, we conducted a rigorous
series of experiments, including tests on synthetic data as well as on popular
arithmetic reasoning benchmarks such as GSM8K. Our results demonstrate that our
approach outperforms traditional LLMs, achieving a considerable 54.8% -> 61.8%
improvement on the GSM8K benchmark.
Related papers
- Chain-of-Reasoning: Towards Unified Mathematical Reasoning in Large Language Models via a Multi-Paradigm Perspective [90.86370957353911]
Chain-of-Reasoning (CoR) is a novel unified framework that integrates multiple reasoning paradigms.
CoR generates multiple potential answers using different reasoning paradigms and synthesizes them into a coherent final solution.
Experimental results demonstrate that CoR-Math-7B significantly outperforms current SOTA models.
arXiv Detail & Related papers (2025-01-19T16:53:26Z) - Confidence Diagram of Nonparametric Ranking for Uncertainty Assessment in Large Language Models Evaluation [20.022623972491733]
Ranking large language models (LLMs) has proven to be an effective tool to improve alignment based on the best-of-$N$ policy.
We propose a new inferential framework for hypothesis testing among the ranking for language models.
arXiv Detail & Related papers (2024-12-07T02:34:30Z) - Align-SLM: Textless Spoken Language Models with Reinforcement Learning from AI Feedback [50.84142264245052]
This work introduces the Align-SLM framework to enhance the semantic understanding of textless Spoken Language Models (SLMs)
Our approach generates multiple speech continuations from a given prompt and uses semantic metrics to create preference data for Direct Preference Optimization (DPO)
We evaluate the framework using ZeroSpeech 2021 benchmarks for lexical and syntactic modeling, the spoken version of the StoryCloze dataset for semantic coherence, and other speech generation metrics, including the GPT4-o score and human evaluation.
arXiv Detail & Related papers (2024-11-04T06:07:53Z) - Strategic Chain-of-Thought: Guiding Accurate Reasoning in LLMs through Strategy Elicitation [16.350747493026432]
The Chain-of-Thought (CoT) paradigm has emerged as a critical approach for enhancing the reasoning capabilities of large language models (LLMs)
We propose the textbfStrategic Chain-of-Thought (SCoT) to refine LLM performance by integrating strategic knowledge prior to generating intermediate reasoning steps.
SCoT employs a two-stage approach within a single prompt: first eliciting an effective problem-solving strategy, which is then used to guide the generation of high-quality CoT paths and final answers.
arXiv Detail & Related papers (2024-09-05T06:28:05Z) - Building Math Agents with Multi-Turn Iterative Preference Learning [56.71330214021884]
This paper studies the complementary direct preference learning approach to further improve model performance.
Existing direct preference learning algorithms are originally designed for the single-turn chat task.
We introduce a multi-turn direct preference learning framework, tailored for this context.
arXiv Detail & Related papers (2024-09-04T02:41:04Z) - MR-GSM8K: A Meta-Reasoning Benchmark for Large Language Model Evaluation [60.65820977963331]
We introduce a novel evaluation paradigm for Large Language Models (LLMs)
This paradigm shifts the emphasis from result-oriented assessments, which often neglect the reasoning process, to a more comprehensive evaluation.
By applying this paradigm in the GSM8K dataset, we have developed the MR-GSM8K benchmark.
arXiv Detail & Related papers (2023-12-28T15:49:43Z) - BAL: Balancing Diversity and Novelty for Active Learning [53.289700543331925]
We introduce a novel framework, Balancing Active Learning (BAL), which constructs adaptive sub-pools to balance diverse and uncertain data.
Our approach outperforms all established active learning methods on widely recognized benchmarks by 1.20%.
arXiv Detail & Related papers (2023-12-26T08:14:46Z) - Large Language Models Are Latent Variable Models: Explaining and Finding
Good Demonstrations for In-Context Learning [104.58874584354787]
In recent years, pre-trained large language models (LLMs) have demonstrated remarkable efficiency in achieving an inference-time few-shot learning capability known as in-context learning.
This study aims to examine the in-context learning phenomenon through a Bayesian lens, viewing real-world LLMs as latent variable models.
arXiv Detail & Related papers (2023-01-27T18:59:01Z) - GEDI: GEnerative and DIscriminative Training for Self-Supervised
Learning [3.6804038214708563]
We study state-of-the-art self-supervised learning objectives and propose a unified formulation based on likelihood learning.
We refer to this combined framework as GEDI, which stands for GEnerative and DIscriminative training.
We show that GEDI outperforms existing self-supervised learning strategies in terms of clustering performance by a wide margin.
arXiv Detail & Related papers (2022-12-27T09:33:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.