IERL: Interpretable Ensemble Representation Learning -- Combining
CrowdSourced Knowledge and Distributed Semantic Representations
- URL: http://arxiv.org/abs/2306.13865v1
- Date: Sat, 24 Jun 2023 05:02:34 GMT
- Title: IERL: Interpretable Ensemble Representation Learning -- Combining
CrowdSourced Knowledge and Distributed Semantic Representations
- Authors: Yuxin Zi, Kaushik Roy, Vignesh Narayanan, Manas Gaur, Amit Sheth
- Abstract summary: Large Language Models (LLMs) encode meanings of words in the form of distributed semantics.
Recent studies have shown that LLMs tend to generate unintended, inconsistent, or wrong texts as outputs.
We propose a novel ensemble learning method, Interpretable Ensemble Representation Learning (IERL), that systematically combines LLM and crowdsourced knowledge representations.
- Score: 11.008412414253662
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) encode meanings of words in the form of
distributed semantics. Distributed semantics capture common statistical
patterns among language tokens (words, phrases, and sentences) from large
amounts of data. LLMs perform exceedingly well across General Language
Understanding Evaluation (GLUE) tasks designed to test a model's understanding
of the meanings of the input tokens. However, recent studies have shown that
LLMs tend to generate unintended, inconsistent, or wrong texts as outputs when
processing inputs that were seen rarely during training, or inputs that are
associated with diverse contexts (e.g., well-known hallucination phenomenon in
language generation tasks). Crowdsourced and expert-curated knowledge graphs
such as ConceptNet are designed to capture the meaning of words from a compact
set of well-defined contexts. Thus LLMs may benefit from leveraging such
knowledge contexts to reduce inconsistencies in outputs. We propose a novel
ensemble learning method, Interpretable Ensemble Representation Learning
(IERL), that systematically combines LLM and crowdsourced knowledge
representations of input tokens. IERL has the distinct advantage of being
interpretable by design (when was the LLM context used vs. when was the
knowledge context used?) over state-of-the-art (SOTA) methods, allowing
scrutiny of the inputs in conjunction with the parameters of the model,
facilitating the analysis of models' inconsistent or irrelevant outputs.
Although IERL is agnostic to the choice of LLM and crowdsourced knowledge, we
demonstrate our approach using BERT and ConceptNet. We report improved or
competitive results with IERL across GLUE tasks over current SOTA methods and
significantly enhanced model interpretability.
Related papers
- Traffic Light or Light Traffic? Investigating Phrasal Semantics in Large Language Models [41.233879429714925]
This study critically examines the capacity of API-based large language models to comprehend phrase semantics.
We assess the performance of LLMs in executing phrase semantic reasoning tasks guided by natural language instructions.
We conduct detailed error analyses to interpret the limitations faced by LLMs in comprehending phrase semantics.
arXiv Detail & Related papers (2024-10-03T08:44:17Z) - Large Language Models are Interpretable Learners [53.56735770834617]
In this paper, we show a combination of Large Language Models (LLMs) and symbolic programs can bridge the gap between expressiveness and interpretability.
The pretrained LLM with natural language prompts provides a massive set of interpretable modules that can transform raw input into natural language concepts.
As the knowledge learned by LSP is a combination of natural language descriptions and symbolic rules, it is easily transferable to humans (interpretable) and other LLMs.
arXiv Detail & Related papers (2024-06-25T02:18:15Z) - Detecting Hallucinations in Large Language Model Generation: A Token Probability Approach [0.0]
Large Language Models (LLMs) produce inaccurate outputs, also known as hallucinations.
This paper introduces a supervised learning approach employing only four numerical features derived from tokens and vocabulary probabilities obtained from other evaluators.
The method yields promising results, surpassing state-of-the-art outcomes in multiple tasks across three different benchmarks.
arXiv Detail & Related papers (2024-05-30T03:00:47Z) - FAC$^2$E: Better Understanding Large Language Model Capabilities by Dissociating Language and Cognition [56.76951887823882]
Large language models (LLMs) are primarily evaluated by overall performance on various text understanding and generation tasks.
We present FAC$2$E, a framework for Fine-grAined and Cognition-grounded LLMs' Capability Evaluation.
arXiv Detail & Related papers (2024-02-29T21:05:37Z) - Learning to Reduce: Optimal Representations of Structured Data in
Prompting Large Language Models [42.16047343029512]
Large Language Models (LLMs) have been widely used as general-purpose AI agents.
We propose a framework, Learning to Reduce, that fine-tunes a language model to generate a reduced version of an input context.
We show that our model achieves comparable accuracies in selecting the relevant evidence from an input context.
arXiv Detail & Related papers (2024-02-22T00:41:23Z) - Large Language Models Can Better Understand Knowledge Graphs Than We Thought [13.336418752729987]
knowledge graph (KG) embeddings with model parameters become increasingly costly.
Current prompting methods often rely on a trial-and-error approach.
We show that unordered linearized triples are more effective for LLMs' understanding of KGs compared to fluent NL text.
arXiv Detail & Related papers (2024-02-18T10:44:03Z) - C-ICL: Contrastive In-context Learning for Information Extraction [54.39470114243744]
c-ICL is a novel few-shot technique that leverages both correct and incorrect sample constructions to create in-context learning demonstrations.
Our experiments on various datasets indicate that c-ICL outperforms previous few-shot in-context learning methods.
arXiv Detail & Related papers (2024-02-17T11:28:08Z) - Supervised Knowledge Makes Large Language Models Better In-context Learners [94.89301696512776]
Large Language Models (LLMs) exhibit emerging in-context learning abilities through prompt engineering.
The challenge of improving the generalizability and factuality of LLMs in natural language understanding and question answering remains under-explored.
We propose a framework that enhances the reliability of LLMs as it: 1) generalizes out-of-distribution data, 2) elucidates how LLMs benefit from discriminative models, and 3) minimizes hallucinations in generative tasks.
arXiv Detail & Related papers (2023-12-26T07:24:46Z) - Sparsity-Guided Holistic Explanation for LLMs with Interpretable
Inference-Time Intervention [53.896974148579346]
Large Language Models (LLMs) have achieved unprecedented breakthroughs in various natural language processing domains.
The enigmatic black-box'' nature of LLMs remains a significant challenge for interpretability, hampering transparent and accountable applications.
We propose a novel methodology anchored in sparsity-guided techniques, aiming to provide a holistic interpretation of LLMs.
arXiv Detail & Related papers (2023-12-22T19:55:58Z) - Label Words are Anchors: An Information Flow Perspective for
Understanding In-Context Learning [77.7070536959126]
In-context learning (ICL) emerges as a promising capability of large language models (LLMs)
In this paper, we investigate the working mechanism of ICL through an information flow lens.
We introduce an anchor re-weighting method to improve ICL performance, a demonstration compression technique to expedite inference, and an analysis framework for diagnosing ICL errors in GPT2-XL.
arXiv Detail & Related papers (2023-05-23T15:26:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.