LLMs as Function Approximators: Terminology, Taxonomy, and Questions for Evaluation
- URL: http://arxiv.org/abs/2407.13744v1
- Date: Thu, 18 Jul 2024 17:49:56 GMT
- Title: LLMs as Function Approximators: Terminology, Taxonomy, and Questions for Evaluation
- Authors: David Schlangen,
- Abstract summary: This paper argues that the loss of clarity on what these models model leads to metaphors like "artificial general intelligences"
The proposal is to see their generality, and their potential value, in their ability to approximate specialist function, based on a natural language specification.
- Score: 18.2932386988379
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Natural Language Processing has moved rather quickly from modelling specific tasks to taking more general pre-trained models and fine-tuning them for specific tasks, to a point where we now have what appear to be inherently generalist models. This paper argues that the resultant loss of clarity on what these models model leads to metaphors like "artificial general intelligences" that are not helpful for evaluating their strengths and weaknesses. The proposal is to see their generality, and their potential value, in their ability to approximate specialist function, based on a natural language specification. This framing brings to the fore questions of the quality of the approximation, but beyond that, also questions of discoverability, stability, and protectability of these functions. As the paper will show, this framing hence brings together in one conceptual framework various aspects of evaluation, both from a practical and a theoretical perspective, as well as questions often relegated to a secondary status (such as "prompt injection" and "jailbreaking").
Related papers
- Evaluating Consistency and Reasoning Capabilities of Large Language Models [0.0]
Large Language Models (LLMs) are extensively used today across various sectors, including academia, research, business, and finance.
Despite their widespread adoption, these models often produce incorrect and misleading information, exhibiting a tendency to hallucinate.
This paper aims to evaluate and compare the consistency and reasoning capabilities of both public and proprietary LLMs.
arXiv Detail & Related papers (2024-04-25T10:03:14Z) - Ontology Completion with Natural Language Inference and Concept Embeddings: An Analysis [26.918368764004796]
We consider the problem of finding plausible knowledge that is missing from a given ontology, as a generalisation of the well-studied taxonomy expansion task.
One line of work treats this task as a Natural Language Inference (NLI) problem, relying on the knowledge captured by language models to identify the missing knowledge.
Another line of work uses concept embeddings to identify what different concepts have in common, taking inspiration from cognitive models for category based induction.
arXiv Detail & Related papers (2024-03-25T21:46:35Z) - Explainability for Large Language Models: A Survey [59.67574757137078]
Large language models (LLMs) have demonstrated impressive capabilities in natural language processing.
This paper introduces a taxonomy of explainability techniques and provides a structured overview of methods for explaining Transformer-based language models.
arXiv Detail & Related papers (2023-09-02T22:14:26Z) - An Overview Of Temporal Commonsense Reasoning and Acquisition [20.108317515225504]
Temporal commonsense reasoning refers to the ability to understand the typical temporal context of phrases, actions, and events.
Recent research on the performance of large language models suggests that they often take shortcuts in their reasoning and fall prey to simple linguistic traps.
arXiv Detail & Related papers (2023-07-28T01:30:15Z) - Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Tasks [71.19560970717495]
Recent language models show impressive performance across a wide range of tasks.
Are these skills general and transferable, or specialized to specific tasks seen during pretraining?
We propose an evaluation framework based on "counterfactual" task variants that deviate from the default assumptions underlying standard tasks.
arXiv Detail & Related papers (2023-07-05T17:50:42Z) - Simple Linguistic Inferences of Large Language Models (LLMs): Blind Spots and Blinds [59.71218039095155]
We evaluate language understanding capacities on simple inference tasks that most humans find trivial.
We target (i) grammatically-specified entailments, (ii) premises with evidential adverbs of uncertainty, and (iii) monotonicity entailments.
The models exhibit moderate to low performance on these evaluation sets.
arXiv Detail & Related papers (2023-05-24T06:41:09Z) - ExSum: From Local Explanations to Model Understanding [6.23934576145261]
Interpretability methods are developed to understand the working mechanisms of black-box models.
Fulfilling this goal requires both that the explanations generated by these methods are correct and that people can easily and reliably understand them.
We introduce explanation summary (ExSum), a mathematical framework for quantifying model understanding.
arXiv Detail & Related papers (2022-04-30T02:07:20Z) - Analyzing the Limits of Self-Supervision in Handling Bias in Language [52.26068057260399]
We evaluate how well language models capture the semantics of four tasks for bias: diagnosis, identification, extraction and rephrasing.
Our analyses indicate that language models are capable of performing these tasks to widely varying degrees across different bias dimensions, such as gender and political affiliation.
arXiv Detail & Related papers (2021-12-16T05:36:08Z) - Schr\"odinger's Tree -- On Syntax and Neural Language Models [10.296219074343785]
Language models have emerged as NLP's workhorse, displaying increasingly fluent generation capabilities.
We observe a lack of clarity across numerous dimensions, which influences the hypotheses that researchers form.
We outline the implications of the different types of research questions exhibited in studies on syntax.
arXiv Detail & Related papers (2021-10-17T18:25:23Z) - Infusing Finetuning with Semantic Dependencies [62.37697048781823]
We show that, unlike syntax, semantics is not brought to the surface by today's pretrained models.
We then use convolutional graph encoders to explicitly incorporate semantic parses into task-specific finetuning.
arXiv Detail & Related papers (2020-12-10T01:27:24Z) - Knowledge-driven Data Construction for Zero-shot Evaluation in
Commonsense Question Answering [80.60605604261416]
We propose a novel neuro-symbolic framework for zero-shot question answering across commonsense tasks.
We vary the set of language models, training regimes, knowledge sources, and data generation strategies, and measure their impact across tasks.
We show that, while an individual knowledge graph is better suited for specific tasks, a global knowledge graph brings consistent gains across different tasks.
arXiv Detail & Related papers (2020-11-07T22:52:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.