Related papers: How Proficient Are Large Language Models in Formal Languages? An In-Depth Insight for Knowledge Base Question Answering

How Proficient Are Large Language Models in Formal Languages? An In-Depth Insight for Knowledge Base Question Answering

URL: http://arxiv.org/abs/2401.05777v2
Date: Thu, 13 Jun 2024 22:56:47 GMT
Title: How Proficient Are Large Language Models in Formal Languages? An In-Depth Insight for Knowledge Base Question Answering
Authors: Jinxin Liu, Shulin Cao, Jiaxin Shi, Tingjian Zhang, Lunyiu Nie, Linmei Hu, Lei Hou, Juanzi Li,
Abstract summary: Knowledge Base Question Answering (KBQA) aims to answer natural language questions based on facts in knowledge bases. Recent works leverage the capabilities of large language models (LLMs) for logical form generation to improve performance.
Score: 52.86931192259096
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Knowledge Base Question Answering (KBQA) aims to answer natural language questions based on facts in knowledge bases. A typical approach to KBQA is semantic parsing, which translates a question into an executable logical form in a formal language. Recent works leverage the capabilities of large language models (LLMs) for logical form generation to improve performance. However, although it is validated that LLMs are capable of solving some KBQA problems, there has been little discussion on the differences in LLMs' proficiency in formal languages used in semantic parsing. In this work, we propose to evaluate the understanding and generation ability of LLMs to deal with differently structured logical forms by examining the inter-conversion of natural and formal language through in-context learning of LLMs. Extensive experiments with models of different sizes show that state-of-the-art LLMs can understand formal languages as well as humans, but generating correct logical forms given a few examples remains a challenge. Most importantly, our results also indicate that LLMs exhibit considerable sensitivity. In general, the formal language with a lower formalization level, i.e., the more similar it is to natural language, is more friendly to LLMs.

Related papers

Do Large Language Models Excel in Complex Logical Reasoning with Formal Language? [20.53475791645822]
Large Language Models (LLMs) have been shown to achieve breakthrough performance on complex logical reasoning tasks.<n>This paper aims to conduct a comprehensive evaluation of LLMs across various logical reasoning problems utilizing formal languages.
arXiv Detail & Related papers (2025-05-22T17:57:23Z)
Linguistic Blind Spots of Large Language Models [14.755831733659699]
We study the performance of recent large language models (LLMs) on linguistic annotation tasks. We find that recent LLMs show limited efficacy in addressing linguistic queries and often struggle with linguistically complex inputs. Our results provide insights to inform future advancements in LLM design and development.
arXiv Detail & Related papers (2025-03-25T01:47:13Z)
Disparities in LLM Reasoning Accuracy and Explanations: A Case Study on African American English [66.97110551643722]
We investigate dialectal disparities in Large Language Models (LLMs) reasoning tasks. We find that LLMs produce less accurate responses and simpler reasoning chains and explanations for AAE inputs. These findings highlight systematic differences in how LLMs process and reason about different language varieties.
arXiv Detail & Related papers (2025-03-06T05:15:34Z)
Intermediate Languages Matter: Formal Choice Drives Neurosymbolic LLM Reasoning [50.99811144731619]
We show that the choice of formal language affects both the syntactic and the semantic reasoning capability.<n>We conclude that on average, context-aware encodings help LLMs to reason, while there is no apparent effect of using comments or markdown syntax.
arXiv Detail & Related papers (2025-02-24T14:49:52Z)
Randomly Sampled Language Reasoning Problems Reveal Limits of LLMs [8.146860674148044]
We attempt to measure models' language understanding capacity while circumventing the risk of dataset recall. We parameterize large families of language tasks recognized by deterministic finite automata (DFAs) We find that, even in the strikingly simple setting of 3-state DFAs, LLMs underperform un parameterized ngram models on both language recognition and synthesis tasks.
arXiv Detail & Related papers (2025-01-06T07:57:51Z)
Large Language Models are Easily Confused: A Quantitative Metric, Security Implications and Typological Analysis [5.029635172046762]
Language Confusion is a phenomenon where Large Language Models (LLMs) generate text that is neither in the desired language, nor in a contextually appropriate language. We introduce a novel metric, Language Confusion Entropy, designed to measure and quantify this confusion.
arXiv Detail & Related papers (2024-10-17T05:43:30Z)
Understanding and Mitigating Language Confusion in LLMs [76.96033035093204]
We evaluate 15 typologically diverse languages with existing and newly-created English and multilingual prompts. We find that Llama Instruct and Mistral models exhibit high degrees of language confusion. We find that language confusion can be partially mitigated via few-shot prompting, multilingual SFT and preference tuning.
arXiv Detail & Related papers (2024-06-28T17:03:51Z)
LogicBench: Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models [52.03659714625452]
Recently developed large language models (LLMs) have been shown to perform remarkably well on a wide range of language understanding tasks. But, can they really "reason" over the natural language? This question has been receiving significant research attention and many reasoning skills such as commonsense, numerical, and qualitative have been studied.
arXiv Detail & Related papers (2024-04-23T21:08:49Z)
MLaKE: Multilingual Knowledge Editing Benchmark for Large Language Models [65.10456412127405]
MLaKE is a benchmark for the adaptability of knowledge editing methods across five languages. MLaKE aggregates fact chains from Wikipedia across languages and generates questions in both free-form and multiple-choice. We evaluate the multilingual knowledge editing generalization capabilities of existing methods on MLaKE.
arXiv Detail & Related papers (2024-04-07T15:23:28Z)
How Well Do Large Language Models Understand Syntax? An Evaluation by Asking Natural Language Questions [25.39259677000101]
This study seeks to explore the question through the lens of syntax. We craft questions targeting nine syntactic knowledge points that are most closely related to sentence comprehension. Experiments conducted on 24 large language models (LLMs) suggest that most have a limited grasp of syntactic knowledge.
arXiv Detail & Related papers (2023-11-14T16:30:36Z)
Leveraging Large Language Models to Generate Answer Set Programs [5.532477732693001]
Large language models (LLMs) have demonstrated exceptional performance in various natural language processing tasks. This paper proposes a neuro-symbolic method that combines the strengths of large language models and answer set programming.
arXiv Detail & Related papers (2023-07-15T03:40:55Z)
Coupling Large Language Models with Logic Programming for Robust and General Reasoning from Text [5.532477732693001]
We show that a large language model can serve as a highly effective few-shot semantically. It can convert natural language sentences into a logical form that serves as input for answer set programs. We demonstrate that this method achieves state-of-the-art performance on several benchmarks, including bAbI, StepGame, CLUTRR, and gSCAN.
arXiv Detail & Related papers (2023-07-15T03:29:59Z)
ChatABL: Abductive Learning via Natural Language Interaction with ChatGPT [72.83383437501577]
Large language models (LLMs) have recently demonstrated significant potential in mathematical abilities. LLMs currently have difficulty in bridging perception, language understanding and reasoning capabilities. This paper presents a novel method for integrating LLMs into the abductive learning framework.
arXiv Detail & Related papers (2023-04-21T16:23:47Z)
Shortcut Learning of Large Language Models in Natural Language Understanding [119.45683008451698]
Large language models (LLMs) have achieved state-of-the-art performance on a series of natural language understanding tasks. They might rely on dataset bias and artifacts as shortcuts for prediction. This has significantly affected their generalizability and adversarial robustness.
arXiv Detail & Related papers (2022-08-25T03:51:39Z)
Foundations of Symbolic Languages for Model Interpretability [2.3361634876233817]
We study the computational complexity of FOIL queries over two classes of ML models often deemed to be easily interpretable. We present a prototype implementation of FOIL wrapped in a high-level declarative language.
arXiv Detail & Related papers (2021-10-05T21:56:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.