Related papers: Don't Just Say "I don't know"! Self-aligning Large Language Models for Responding to Unknown Questions with Explanations

Don't Just Say "I don't know"! Self-aligning Large Language Models for Responding to Unknown Questions with Explanations

URL: http://arxiv.org/abs/2402.15062v2
Date: Wed, 02 Oct 2024 02:09:37 GMT
Title: Don't Just Say "I don't know"! Self-aligning Large Language Models for Responding to Unknown Questions with Explanations
Authors: Yang Deng, Yong Zhao, Moxin Li, See-Kiong Ng, Tat-Seng Chua,
Abstract summary: Self-alignment method is capable of not only refusing to answer but also providing explanation to the unanswerability of unknown questions. We conduct disparity-driven self-curation to select qualified data for fine-tuning the LLM itself for aligning the responses to unknown questions as desired.
Score: 70.6395572287422
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Despite the remarkable abilities of Large Language Models (LLMs) to answer questions, they often display a considerable level of overconfidence even when the question does not have a definitive answer. To avoid providing hallucinated answers to these unknown questions, existing studies typically investigate approaches to refusing to answer these questions. In this work, we propose a novel and scalable self-alignment method to utilize the LLM itself to enhance its response-ability to different types of unknown questions, being capable of not only refusing to answer but also providing explanation to the unanswerability of unknown questions. Specifically, the Self-Align method first employ a two-stage class-aware self-augmentation approach to generate a large amount of unknown question-response data. Then we conduct disparity-driven self-curation to select qualified data for fine-tuning the LLM itself for aligning the responses to unknown questions as desired. Experimental results on two datasets across four types of unknown questions validate the superiority of the Self-Align method over existing baselines in terms of three types of task formulation.

Related papers

ELOQ: Resources for Enhancing LLM Detection of Out-of-Scope Questions [52.33835101586687]
Large Language Models (LLMs) are widely used in Conversational AI systems to generate responses to user inquiries. We propose a guided hallucination-based method to efficiently generate a diverse set of out-of-scope questions from a given document corpus.
arXiv Detail & Related papers (2024-10-18T16:11:29Z)
I Could've Asked That: Reformulating Unanswerable Questions [89.93173151422636]
We evaluate open-source and proprietary models for reformulating unanswerable questions. GPT-4 and Llama2-7B successfully reformulate questions only 26% and 12% of the time, respectively. We publicly release the benchmark and the code to reproduce the experiments.
arXiv Detail & Related papers (2024-07-24T17:59:07Z)
Clarify When Necessary: Resolving Ambiguity Through Interaction with LMs [58.620269228776294]
We propose a task-agnostic framework for resolving ambiguity by asking users clarifying questions. We evaluate systems across three NLP applications: question answering, machine translation and natural language inference. We find that intent-sim is robust, demonstrating improvements across a wide range of NLP tasks and LMs.
arXiv Detail & Related papers (2023-11-16T00:18:50Z)
Open-ended Commonsense Reasoning with Unrestricted Answer Scope [47.14397700770702]
Open-ended Commonsense Reasoning is defined as solving a commonsense question without providing 1) a short list of answer candidates and 2) a pre-defined answer scope. In this work, we leverage pre-trained language models to iteratively retrieve reasoning paths on the external knowledge base. The reasoning paths can help to identify the most precise answer to the commonsense question.
arXiv Detail & Related papers (2023-10-18T02:45:54Z)
Can NLP Models 'Identify', 'Distinguish', and 'Justify' Questions that Don't have a Definitive Answer? [43.03399918557937]
In real-world applications, users often ask questions that don't have a definitive answer. We introduce QnotA, a dataset consisting of five different categories of questions that don't have definitive answers. With this data, we formulate three evaluation tasks that test a system's ability to 'identify', 'distinguish', and 'justify' QnotA questions. We show that even SOTA models including GPT-3 and Flan T5 do not fare well on these tasks and lack considerably behind the human performance baseline.
arXiv Detail & Related papers (2023-09-08T23:12:03Z)
Answering Ambiguous Questions with a Database of Questions, Answers, and Revisions [95.92276099234344]
We present a new state-of-the-art for answering ambiguous questions that exploits a database of unambiguous questions generated from Wikipedia. Our method improves performance by 15% on recall measures and 10% on measures which evaluate disambiguating questions from predicted outputs.
arXiv Detail & Related papers (2023-08-16T20:23:16Z)
Selectively Answering Ambiguous Questions [38.83930394700588]
We find that the most reliable approach to decide when to abstain involves quantifying repetition within sampled model outputs. Our results suggest that sampling-based confidence scores help calibrate answers to relatively unambiguous questions.
arXiv Detail & Related papers (2023-05-24T01:25:38Z)
CLAM: Selective Clarification for Ambiguous Questions with Large Language Models [37.37606905433334]
We show that current SotA models do not ask the user for clarification when presented with imprecise questions. We introduce CLAM, a framework that first uses the model to detect ambiguous questions and if an ambiguous question is detected, prompts the model to ask the user for clarification. We show that our method achieves a 20.15 percentage point accuracy improvement over SotA on a novel ambiguous question-answering answering data set.
arXiv Detail & Related papers (2022-12-15T12:47:18Z)
Double Retrieval and Ranking for Accurate Question Answering [120.69820139008138]
We show that an answer verification step introduced in Transformer-based answer selection models can significantly improve the state of the art in Question Answering. The results on three well-known datasets for AS2 show consistent and significant improvement of the state of the art.
arXiv Detail & Related papers (2022-01-16T06:20:07Z)
Stay Hungry, Stay Focused: Generating Informative and Specific Questions in Information-Seeking Conversations [41.74162467619795]
We investigate the problem of generating informative questions in information-asymmetric conversations. To generate pragmatic questions, we use reinforcement learning to optimize an informativeness metric. We demonstrate that the resulting pragmatic questioner substantially improves the informativeness and specificity of questions generated over a baseline model.
arXiv Detail & Related papers (2020-04-30T00:49:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.