Using Natural Language Explanations to Improve Robustness of In-context   Learning
        - URL: http://arxiv.org/abs/2311.07556v2
- Date: Mon, 20 May 2024 16:24:58 GMT
- Title: Using Natural Language Explanations to Improve Robustness of In-context   Learning
- Authors: Xuanli He, Yuxiang Wu, Oana-Maria Camburu, Pasquale Minervini, Pontus Stenetorp, 
- Abstract summary: Large language models (LLMs) can excel in many tasks via in-context learning (ICL)
We investigate whether augmenting ICL with natural language explanations (NLEs) improves the robustness of LLMs on adversarial datasets.
- Score: 35.18010811754959
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract:   Recent studies demonstrated that large language models (LLMs) can excel in many tasks via in-context learning (ICL). However, recent works show that ICL-prompted models tend to produce inaccurate results when presented with adversarial inputs. In this work, we investigate whether augmenting ICL with natural language explanations (NLEs) improves the robustness of LLMs on adversarial datasets covering natural language inference and paraphrasing identification. We prompt LLMs with a small set of human-generated NLEs to produce further NLEs, yielding more accurate results than both a zero-shot-ICL setting and using only human-generated NLEs. Our results on five popular LLMs (GPT3.5-turbo, Llama2, Vicuna, Zephyr, and Mistral) show that our approach yields over 6% improvement over baseline approaches for eight adversarial datasets: HANS, ISCS, NaN, ST, PICD, PISP, ANLI, and PAWS. Furthermore, previous studies have demonstrated that prompt selection strategies significantly enhance ICL on in-distribution test sets. However, our findings reveal that these strategies do not match the efficacy of our approach for robustness evaluations, resulting in an accuracy drop of 8% compared to the proposed approach. 
 
      
        Related papers
        - LLM-Independent Adaptive RAG: Let the Question Speak for Itself [47.60917219813637]
 Large Language Models (LLMs) are prone to hallucinations, and Retrieval-Augmented Generation (RAG) helps this, but at a high computational cost while risking misinformation.<n>In this study, we introduce lightweight LLM-independent adaptive retrieval methods based on external information.
 arXiv  Detail & Related papers  (2025-05-07T08:58:52Z)
- Adaptive Pruning for Large Language Models with Structural Importance   Awareness [66.2690963378878]
 Large language models (LLMs) have significantly improved language understanding and generation capabilities.
LLMs are difficult to deploy on resource-constrained edge devices due to their high computational and storage resource demands.
We propose structurally-aware adaptive pruning (SAAP) to significantly reduce the computational and memory costs while maintaining model performance.
 arXiv  Detail & Related papers  (2024-12-19T18:08:04Z)
- Provenance: A Light-weight Fact-checker for Retrieval Augmented LLM   Generation Output [49.893971654861424]
 We present a light-weight approach for detecting nonfactual outputs from retrieval-augmented generation (RAG)
We compute a factuality score that can be thresholded to yield a binary decision.
Our experiments show high area under the ROC curve (AUC) across a wide range of relevant open source datasets.
 arXiv  Detail & Related papers  (2024-11-01T20:44:59Z)
- Improving the Language Understanding Capabilities of Large Language   Models Using Reinforcement Learning [20.13007387453759]
 Proximal Policy Optimization (PPO) is a framework to improve the capabilities of large language models (LLMs)<n>PPO consistently outperforms supervised fine-tuning, yielding an average improvement of 6.3 points on GLUE.<n>This work highlights a promising direction for adapting LLMs to new tasks by reframing them as reinforcement learning problems.
 arXiv  Detail & Related papers  (2024-10-14T19:16:56Z)
- Assessing LLMs for Zero-shot Abstractive Summarization Through the Lens   of Relevance Paraphrasing [37.400757839157116]
 Large Language Models (LLMs) have achieved state-of-the-art performance at zero-shot generation of abstractive summaries for given articles.
We propose relevance paraphrasing, a simple strategy that can be used to measure the robustness of LLMs as summarizers.
 arXiv  Detail & Related papers  (2024-06-06T12:08:43Z)
- Enhancing Reinforcement Learning with Label-Sensitive Reward for Natural   Language Understanding [11.470005425117371]
 We propose a novel Reinforcement Learning framework enhanced with Label-sensitive Reward (RLLR)
Our method aims to adeptly capture nuanced label-sensitive semantic features during RL, thereby enhancing natural language understanding.
Experiments conducted on five diverse foundation models across eight tasks showcase promising results.
 arXiv  Detail & Related papers  (2024-05-30T07:19:31Z)
- An Empirical Study on the Effectiveness of Large Language Models for   SATD Identification and Classification [13.698224831089464]
 Self-Admitted Technical Debt (SATD) is a concept highlighting sub-optimal choices in software development documented in code comments or other project resources.
This paper investigates the efficacy of large language models (LLMs) in both identification and classification of SATD.
 arXiv  Detail & Related papers  (2024-05-10T20:39:24Z)
- Analyzing and Adapting Large Language Models for Few-Shot Multilingual
  NLU: Are We There Yet? [82.02076369811402]
 Supervised fine-tuning (SFT), supervised instruction tuning (SIT) and in-context learning (ICL) are three alternative, de facto standard approaches to few-shot learning.
We present an extensive and systematic comparison of the three approaches, testing them on 6 high- and low-resource languages, three different NLU tasks, and a myriad of language and domain setups.
Our observations show that supervised instruction tuning has the best trade-off between performance and resource requirements.
 arXiv  Detail & Related papers  (2024-03-04T10:48:13Z)
- Supervised Knowledge Makes Large Language Models Better In-context   Learners [94.89301696512776]
 Large Language Models (LLMs) exhibit emerging in-context learning abilities through prompt engineering.
The challenge of improving the generalizability and factuality of LLMs in natural language understanding and question answering remains under-explored.
We propose a framework that enhances the reliability of LLMs as it: 1) generalizes out-of-distribution data, 2) elucidates how LLMs benefit from discriminative models, and 3) minimizes hallucinations in generative tasks.
 arXiv  Detail & Related papers  (2023-12-26T07:24:46Z)
- Which Examples to Annotate for In-Context Learning? Towards Effective
  and Efficient Selection [35.924633625147365]
 Large Language Models (LLMs) can adapt to new tasks via in-context learning (ICL)
In this work, we investigate an active learning approach for ICL, where there is a limited budget for annotating examples.
We propose a model-adaptive optimization-free algorithm, termed AdaICL, which identifies examples that the model is uncertain about.
 arXiv  Detail & Related papers  (2023-10-30T22:03:55Z)
- Beyond Task Performance: Evaluating and Reducing the Flaws of Large
  Multimodal Models with In-Context Learning [105.77733287326308]
 We evaluate 10 recent open-source LMMs from 3B up to 80B parameter scale, on 5 different axes; hallucinations, abstention, compositionality, explainability and instruction following.
We explore the training-free in-context learning (ICL) as a solution, and study how it affects these limitations.
Based on our ICL study, (3) we push ICL further and propose new multimodal ICL variants such as; Multitask-ICL, Chain-of-Hindsight-ICL, and Self-Correcting-ICL.
 arXiv  Detail & Related papers  (2023-10-01T12:02:59Z)
- Are Large Language Models Really Robust to Word-Level Perturbations? [68.60618778027694]
 We propose a novel rational evaluation approach that leverages pre-trained reward models as diagnostic tools.
Longer conversations manifest the comprehensive grasp of language models in terms of their proficiency in understanding questions.
Our results demonstrate that LLMs frequently exhibit vulnerability to word-level perturbations that are commonplace in daily language usage.
 arXiv  Detail & Related papers  (2023-09-20T09:23:46Z)
- Large Language Models Are Latent Variable Models: Explaining and Finding
  Good Demonstrations for In-Context Learning [104.58874584354787]
 In recent years, pre-trained large language models (LLMs) have demonstrated remarkable efficiency in achieving an inference-time few-shot learning capability known as in-context learning.
This study aims to examine the in-context learning phenomenon through a Bayesian lens, viewing real-world LLMs as latent variable models.
 arXiv  Detail & Related papers  (2023-01-27T18:59:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.