Evaluating and Explaining Large Language Models for Code Using Syntactic
Structures
- URL: http://arxiv.org/abs/2308.03873v1
- Date: Mon, 7 Aug 2023 18:50:57 GMT
- Title: Evaluating and Explaining Large Language Models for Code Using Syntactic
Structures
- Authors: David N Palacio, Alejandro Velasco, Daniel Rodriguez-Cardenas, Kevin
Moran, Denys Poshyvanyk
- Abstract summary: This paper introduces ASTxplainer, an explainability method specific to Large Language Models for code.
At its core, ASTxplainer provides an automated method for aligning token predictions with AST nodes.
We perform an empirical evaluation on 12 popular LLMs for code using a curated dataset of the most popular GitHub projects.
- Score: 74.93762031957883
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Large Language Models (LLMs) for code are a family of high-parameter,
transformer-based neural networks pre-trained on massive datasets of both
natural and programming languages. These models are rapidly being employed in
commercial AI-based developer tools, such as GitHub CoPilot. However, measuring
and explaining their effectiveness on programming tasks is a challenging
proposition, given their size and complexity. The methods for evaluating and
explaining LLMs for code are inextricably linked. That is, in order to explain
a model's predictions, they must be reliably mapped to fine-grained,
understandable concepts. Once this mapping is achieved, new methods for
detailed model evaluations are possible. However, most current explainability
techniques and evaluation benchmarks focus on model robustness or individual
task performance, as opposed to interpreting model predictions.
To this end, this paper introduces ASTxplainer, an explainability method
specific to LLMs for code that enables both new methods for LLM evaluation and
visualizations of LLM predictions that aid end-users in understanding model
predictions. At its core, ASTxplainer provides an automated method for aligning
token predictions with AST nodes, by extracting and aggregating normalized
model logits within AST structures. To demonstrate the practical benefit of
ASTxplainer, we illustrate the insights that our framework can provide by
performing an empirical evaluation on 12 popular LLMs for code using a curated
dataset of the most popular GitHub projects. Additionally, we perform a user
study examining the usefulness of an ASTxplainer-derived visualization of model
predictions aimed at enabling model users to explain predictions. The results
of these studies illustrate the potential for ASTxplainer to provide insights
into LLM effectiveness, and aid end-users in understanding predictions.
Related papers
- Towards More Trustworthy and Interpretable LLMs for Code through Syntax-Grounded Explanations [48.07182711678573]
ASTrust generates explanations grounded in the relationship between model confidence and syntactic structures of programming languages.
We develop an automated visualization that illustrates the aggregated model confidence scores superimposed on sequence, heat-map, and graph-based visuals of syntactic structures from ASTs.
arXiv Detail & Related papers (2024-07-12T04:38:28Z) - LLM Processes: Numerical Predictive Distributions Conditioned on Natural Language [35.84181171987974]
Our goal is to build a regression model that can process numerical data and make probabilistic predictions at arbitrary locations.
We start by exploring strategies for eliciting explicit, coherent numerical predictive distributions from Large Language Models.
We demonstrate the ability to usefully incorporate text into numerical predictions, improving predictive performance and giving quantitative structure that reflects qualitative descriptions.
arXiv Detail & Related papers (2024-05-21T15:13:12Z) - Towards Modeling Learner Performance with Large Language Models [7.002923425715133]
This paper investigates whether the pattern recognition and sequence modeling capabilities of LLMs can be extended to the domain of knowledge tracing.
We compare two approaches to using LLMs for this task, zero-shot prompting and model fine-tuning, with existing, non-LLM approaches to knowledge tracing.
While LLM-based approaches do not achieve state-of-the-art performance, fine-tuned LLMs surpass the performance of naive baseline models and perform on par with standard Bayesian Knowledge Tracing approaches.
arXiv Detail & Related papers (2024-02-29T14:06:34Z) - Learning to Generate Explainable Stock Predictions using Self-Reflective
Large Language Models [54.21695754082441]
We propose a framework to teach Large Language Models (LLMs) to generate explainable stock predictions.
A reflective agent learns how to explain past stock movements through self-reasoning, while the PPO trainer trains the model to generate the most likely explanations.
Our framework can outperform both traditional deep-learning and LLM methods in prediction accuracy and Matthews correlation coefficient.
arXiv Detail & Related papers (2024-02-06T03:18:58Z) - RecExplainer: Aligning Large Language Models for Explaining Recommendation Models [50.74181089742969]
Large language models (LLMs) have demonstrated remarkable intelligence in understanding, reasoning, and instruction following.
This paper presents the initial exploration of using LLMs as surrogate models to explain black-box recommender models.
To facilitate an effective alignment, we introduce three methods: behavior alignment, intention alignment, and hybrid alignment.
arXiv Detail & Related papers (2023-11-18T03:05:43Z) - In-Context Explainers: Harnessing LLMs for Explaining Black Box Models [28.396104334980492]
Large Language Models (LLMs) have demonstrated exceptional capabilities in complex tasks like machine translation, commonsense reasoning, and language understanding.
One of the primary reasons for the adaptability of LLMs in such diverse tasks is their in-context learning (ICL) capability, which allows them to perform well on new tasks by simply using a few task samples in the prompt.
We propose a novel framework, In-Context Explainers, comprising of three novel approaches that exploit the ICL capabilities of LLMs to explain the predictions made by other predictive models.
arXiv Detail & Related papers (2023-10-09T15:31:03Z) - Faithful Explanations of Black-box NLP Models Using LLM-generated
Counterfactuals [67.64770842323966]
Causal explanations of predictions of NLP systems are essential to ensure safety and establish trust.
Existing methods often fall short of explaining model predictions effectively or efficiently.
We propose two approaches for counterfactual (CF) approximation.
arXiv Detail & Related papers (2023-10-01T07:31:04Z) - Language models are weak learners [71.33837923104808]
We show that prompt-based large language models can operate effectively as weak learners.
We incorporate these models into a boosting approach, which can leverage the knowledge within the model to outperform traditional tree-based boosting.
Results illustrate the potential for prompt-based LLMs to function not just as few-shot learners themselves, but as components of larger machine learning pipelines.
arXiv Detail & Related papers (2023-06-25T02:39:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.