Self-Agreement: A Framework for Fine-tuning Language Models to Find
Agreement among Diverse Opinions
- URL: http://arxiv.org/abs/2305.11460v1
- Date: Fri, 19 May 2023 06:27:16 GMT
- Title: Self-Agreement: A Framework for Fine-tuning Language Models to Find
Agreement among Diverse Opinions
- Authors: Shiyao Ding and Takayuki Ito
- Abstract summary: Self-Agreement is a novel framework for fine-tuning large language models to autonomously find agreement.
Our approach employs the generative pre-trained transformer-3 to generate multiple opinions for each question in a question dataset.
A bidirectional encoder representations from transformers (BERT)-based model selects the one with the highest agreement score.
Remarkably, a pre-trained LLM fine-tuned by our Self-Agreement framework achieves comparable performance to GPT-3 with only 1/25 of its parameters.
- Score: 1.6752182911522517
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Finding an agreement among diverse opinions is a challenging topic in
multiagent systems. Recently, large language models (LLMs) have shown great
potential in addressing this challenge due to their remarkable capabilities in
comprehending human opinions and generating human-like text. However, they
typically rely on extensive human-annotated data. In this paper, we propose
Self-Agreement, a novel framework for fine-tuning LLMs to autonomously find
agreement using data generated by LLM itself. Specifically, our approach
employs the generative pre-trained transformer-3 (GPT-3) to generate multiple
opinions for each question in a question dataset and create several agreement
candidates among these opinions. Then, a bidirectional encoder representations
from transformers (BERT)-based model evaluates the agreement score of each
agreement candidate and selects the one with the highest agreement score. This
process yields a dataset of question-opinion-agreements, which we use to
fine-tune a pre-trained LLM for discovering agreements among diverse opinions.
Remarkably, a pre-trained LLM fine-tuned by our Self-Agreement framework
achieves comparable performance to GPT-3 with only 1/25 of its parameters,
showcasing its ability to identify agreement among various opinions without the
need for human-annotated data.
Related papers
- A Personalized Conversational Benchmark: Towards Simulating Personalized Conversations [112.81207927088117]
PersonaConvBench is a benchmark for evaluating personalized reasoning and generation in multi-turn conversations with large language models (LLMs)<n>We benchmark several commercial and open-source LLMs under a unified prompting setup and observe that incorporating personalized history yields substantial performance improvements.
arXiv Detail & Related papers (2025-05-20T09:13:22Z) - ReviewInstruct: A Review-Driven Multi-Turn Conversations Generation Method for Large Language Models [9.660334829409253]
Existing methods for generating multi-turn dialogue data struggle to ensure both diversity and quality in instructions.<n>We propose Review-Instruct, a novel framework that synthesizes multi-turn conversations through an iterative "Ask-Respond-Review" process.
arXiv Detail & Related papers (2025-05-16T08:59:07Z) - Aligning Language Models with Demonstrated Feedback [58.834937450242975]
Demonstration ITerated Task Optimization (DITTO) directly aligns language model outputs to a user's demonstrated behaviors.
We evaluate DITTO's ability to learn fine-grained style and task alignment across domains such as news articles, emails, and blog posts.
arXiv Detail & Related papers (2024-06-02T23:13:56Z) - Long-Span Question-Answering: Automatic Question Generation and QA-System Ranking via Side-by-Side Evaluation [65.16137964758612]
We explore the use of long-context capabilities in large language models to create synthetic reading comprehension data from entire books.
Our objective is to test the capabilities of LLMs to analyze, understand, and reason over problems that require a detailed comprehension of long spans of text.
arXiv Detail & Related papers (2024-05-31T20:15:10Z) - Automatically Generating Numerous Context-Driven SFT Data for LLMs across Diverse Granularity [0.0]
AugCon is capable of automatically generating context-driven SFT data across multiple levels of granularity with high diversity, quality and fidelity.
We train a scorer through contrastive learning to collaborate with CST to rank and refine queries.
The results highlight the significant advantages of AugCon in producing high diversity, quality, and fidelity SFT data against several state-of-the-art methods.
arXiv Detail & Related papers (2024-05-26T14:14:18Z) - Large Language Model Evaluation Via Multi AI Agents: Preliminary results [3.8066447473175304]
We introduce a novel multi-agent AI model that aims to assess and compare the performance of various Large Language Models (LLMs)
Our model consists of eight distinct AI agents, each responsible for retrieving code based on a common description from different advanced language models.
We integrate the HumanEval benchmark into our verification agent to assess the generated code's performance, providing insights into their respective capabilities and efficiencies.
arXiv Detail & Related papers (2024-04-01T10:06:04Z) - Evaluation Metrics of Language Generation Models for Synthetic Traffic
Generation Tasks [22.629816738693254]
We show that common NLG metrics, like BLEU, are not suitable for evaluating Synthetic Traffic Generation (STG)
We propose and evaluate several metrics designed to compare the generated traffic to the distribution of real user texts.
arXiv Detail & Related papers (2023-11-21T11:26:26Z) - MAgIC: Investigation of Large Language Model Powered Multi-Agent in
Cognition, Adaptability, Rationality and Collaboration [102.41118020705876]
Large Language Models (LLMs) have marked a significant advancement in the field of natural language processing.
As their applications extend into multi-agent environments, a need has arisen for a comprehensive evaluation framework.
This work introduces a novel benchmarking framework specifically tailored to assess LLMs within multi-agent settings.
arXiv Detail & Related papers (2023-11-14T21:46:27Z) - Sentiment Analysis through LLM Negotiations [58.67939611291001]
A standard paradigm for sentiment analysis is to rely on a singular LLM and makes the decision in a single round.
This paper introduces a multi-LLM negotiation framework for sentiment analysis.
arXiv Detail & Related papers (2023-11-03T12:35:29Z) - Concept-Guided Chain-of-Thought Prompting for Pairwise Comparison Scoring of Texts with Large Language Models [3.656114607436271]
Existing text scoring methods require a large corpus, struggle with short texts, or require hand-labeled data.
We develop a text scoring framework that leverages generative large language models (LLMs)
We apply this approach to better understand speech reflecting aversion to specific political parties on Twitter.
arXiv Detail & Related papers (2023-10-18T15:34:37Z) - Generative Judge for Evaluating Alignment [84.09815387884753]
We propose a generative judge with 13B parameters, Auto-J, designed to address these challenges.
Our model is trained on user queries and LLM-generated responses under massive real-world scenarios.
Experimentally, Auto-J outperforms a series of strong competitors, including both open-source and closed-source models.
arXiv Detail & Related papers (2023-10-09T07:27:15Z) - PRD: Peer Rank and Discussion Improve Large Language Model based Evaluations [10.709365940160685]
Modern large language models (LLMs) are hard to evaluate and compare automatically.
We propose a peer rank (PR) algorithm that takes into account each peer LLM's pairwise preferences of all answer pairs.
We find that our approaches achieve higher accuracy and align better with human judgments.
arXiv Detail & Related papers (2023-07-06T04:05:44Z) - AnnoLLM: Making Large Language Models to Be Better Crowdsourced Annotators [98.11286353828525]
GPT-3.5 series models have demonstrated remarkable few-shot and zero-shot ability across various NLP tasks.
We propose AnnoLLM, which adopts a two-step approach, explain-then-annotate.
We build the first conversation-based information retrieval dataset employing AnnoLLM.
arXiv Detail & Related papers (2023-03-29T17:03:21Z) - Large Language Models are Diverse Role-Players for Summarization
Evaluation [82.31575622685902]
A document summary's quality can be assessed by human annotators on various criteria, both objective ones like grammar and correctness, and subjective ones like informativeness, succinctness, and appeal.
Most of the automatic evaluation methods like BLUE/ROUGE may be not able to adequately capture the above dimensions.
We propose a new evaluation framework based on LLMs, which provides a comprehensive evaluation framework by comparing generated text and reference text from both objective and subjective aspects.
arXiv Detail & Related papers (2023-03-27T10:40:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.