Related papers: Fairness of ChatGPT and the Role Of Explainable-Guided Prompts

Related papers

Measuring What LLMs Think They Do: SHAP Faithfulness and Deployability on Financial Tabular Classification [4.0057196015831495]
Large Language Models (LLMs) have attracted significant attention for classification tasks.<n>Their reliability for structured data remains unclear, particularly in high stakes applications like financial risk assessment.<n>Our study systematically evaluates LLMs and generates their SHAP values on financial classification tasks.
arXiv Detail & Related papers (2025-11-28T19:04:25Z)
Interpreting LLMs as Credit Risk Classifiers: Do Their Feature Explanations Align with Classical ML? [4.0057196015831495]
Large Language Models (LLMs) are increasingly explored as flexible alternatives to classical machine learning models for classification tasks through zero-shot prompting.<n>This study conducts a systematic comparison between zero-shot LLM-based classifiers and LightGBM, a state-of-the-art gradient-boosting model, on a real-world loan default prediction task.<n>We evaluate their predictive performance, analyze feature attributions using SHAP, and assess the reliability of LLM-generated self-explanations.
arXiv Detail & Related papers (2025-10-29T17:05:00Z)
Applying Large Language Models to Travel Satisfaction Analysis [2.5105418815378555]
This study uses household survey data collected in Shanghai to identify the existence and source of misalignment between Large Language Models (LLMs) and humans.<n>LLMs have strongcapabilities in contextual understanding and generalization, significantly reducing dependence on task-specific data.<n>We propose an LLM-based modeling approach that can be applied to model travel behavior with small sample sizes.
arXiv Detail & Related papers (2025-05-29T09:11:58Z)
Navigating Pitfalls: Evaluating LLMs in Machine Learning Programming Education [2.9248916859490173]
This study examines the use of Large Language Models in supporting learning in machine learning education.<n>It focuses on the ability of LLMs to identify common errors of practice in machine learning code, and their ability to provide feedback that can guide learning.
arXiv Detail & Related papers (2025-05-23T08:39:58Z)
An Empirical Study of Many-to-Many Summarization with Large Language Models [82.10000188179168]
Large language models (LLMs) have shown strong multi-lingual abilities, giving them the potential to perform Many-to-many summarization (M2MS) in real applications.<n>This work presents a systematic empirical study on LLMs' M2MS ability.
arXiv Detail & Related papers (2025-05-19T11:18:54Z)
Large Language Models as Reliable Knowledge Bases? [60.25969380388974]
Large Language Models (LLMs) can be viewed as potential knowledge bases (KBs) This study defines criteria that a reliable LLM-as-KB should meet, focusing on factuality and consistency. strategies like ICL and fine-tuning are unsuccessful at making LLMs better KBs.
arXiv Detail & Related papers (2024-07-18T15:20:18Z)
Are Large Language Models Good Statisticians? [10.42853117200315]
StatQA is a new benchmark designed for statistical analysis tasks. We show that even state-of-the-art models such as GPT-4o achieve a best performance of only 64.83%. While open-source LLMs show limited capability, those fine-tuned ones exhibit marked improvements.
arXiv Detail & Related papers (2024-06-12T02:23:51Z)
Towards Modeling Learner Performance with Large Language Models [7.002923425715133]
This paper investigates whether the pattern recognition and sequence modeling capabilities of LLMs can be extended to the domain of knowledge tracing. We compare two approaches to using LLMs for this task, zero-shot prompting and model fine-tuning, with existing, non-LLM approaches to knowledge tracing. While LLM-based approaches do not achieve state-of-the-art performance, fine-tuned LLMs surpass the performance of naive baseline models and perform on par with standard Bayesian Knowledge Tracing approaches.
arXiv Detail & Related papers (2024-02-29T14:06:34Z)
Characterizing Truthfulness in Large Language Model Generations with Local Intrinsic Dimension [63.330262740414646]
We study how to characterize and predict the truthfulness of texts generated from large language models (LLMs) We suggest investigating internal activations and quantifying LLM's truthfulness using the local intrinsic dimension (LID) of model activations.
arXiv Detail & Related papers (2024-02-28T04:56:21Z)
Rethinking Interpretability in the Era of Large Language Models [76.1947554386879]
Large language models (LLMs) have demonstrated remarkable capabilities across a wide array of tasks. The capability to explain in natural language allows LLMs to expand the scale and complexity of patterns that can be given to a human. These new capabilities raise new challenges, such as hallucinated explanations and immense computational costs.
arXiv Detail & Related papers (2024-01-30T17:38:54Z)
Supervised Knowledge Makes Large Language Models Better In-context Learners [94.89301696512776]
Large Language Models (LLMs) exhibit emerging in-context learning abilities through prompt engineering. The challenge of improving the generalizability and factuality of LLMs in natural language understanding and question answering remains under-explored. We propose a framework that enhances the reliability of LLMs as it: 1) generalizes out-of-distribution data, 2) elucidates how LLMs benefit from discriminative models, and 3) minimizes hallucinations in generative tasks.
arXiv Detail & Related papers (2023-12-26T07:24:46Z)
Human Still Wins over LLM: An Empirical Study of Active Learning on Domain-Specific Annotation Tasks [37.56584999012332]
Small models can outperform GPT-3.5 with a few hundreds of labeled data, and they achieve higher or similar performance with GPT-4 despite that they are hundreds time smaller. Based on these findings, we posit that LLM predictions can be used as a warmup method in real-world applications.
arXiv Detail & Related papers (2023-11-16T11:51:13Z)
Are Large Language Models Reliable Judges? A Study on the Factuality Evaluation Capabilities of LLMs [8.526956860672698]
Large Language Models (LLMs) have gained immense attention due to their notable emergent capabilities. This study investigates the potential of LLMs as reliable assessors of factual consistency in summaries generated by text-generation models.
arXiv Detail & Related papers (2023-11-01T17:42:45Z)
Survey on Factuality in Large Language Models: Knowledge, Retrieval and Domain-Specificity [61.54815512469125]
This survey addresses the crucial issue of factuality in Large Language Models (LLMs) As LLMs find applications across diverse domains, the reliability and accuracy of their outputs become vital.
arXiv Detail & Related papers (2023-10-11T14:18:03Z)
TRACE: A Comprehensive Benchmark for Continual Learning in Large Language Models [52.734140807634624]
Aligned large language models (LLMs) demonstrate exceptional capabilities in task-solving, following instructions, and ensuring safety. Existing continual learning benchmarks lack sufficient challenge for leading aligned LLMs. We introduce TRACE, a novel benchmark designed to evaluate continual learning in LLMs.
arXiv Detail & Related papers (2023-10-10T16:38:49Z)
On Learning to Summarize with Large Language Models as References [101.79795027550959]
Large language models (LLMs) are favored by human annotators over the original reference summaries in commonly used summarization datasets. We study an LLM-as-reference learning setting for smaller text summarization models to investigate whether their performance can be substantially improved.
arXiv Detail & Related papers (2023-05-23T16:56:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.