Improving LLMs with a knowledge from databases
- URL: http://arxiv.org/abs/2506.05560v1
- Date: Thu, 05 Jun 2025 20:14:25 GMT
- Title: Improving LLMs with a knowledge from databases
- Authors: Petr Máša,
- Abstract summary: Large language models (LLMs) are achieving significant progress almost every moment now.<n>Many advanced techniques have been introduced and widely accepted, like retrieval-augmentation generation (RAG), agents, and tools.<n>We propose a method that generates a ruleset based on defined knowledge patterns, then converts rules into text form via a rule-to-text converter.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Large language models (LLMs) are achieving significant progress almost every moment now. Many advanced techniques have been introduced and widely accepted, like retrieval-augmentation generation (RAG), agents, and tools. Tools can query the database to answer questions from structured data files or perform groupings or other statistics. This unlocks huge opportunities, such as it can answer any question, but also poses threats, such as safety, because there is no control over the commands that are created. We would like to discuss whether we can create a new method that improves answers based on dataset/database via some interpretable ML methods, namely enhanced association rules. The advantage would be if the method can be also used in some safe technique like RAG. Association rules have a sound history. Since the introduction of CN2 and aproiri, many enhancements have been made. In parallel, enhanced association rules have been introduced and evolved over the last 40 years. The general problem is typically that there are too many rules. There are some techniques for handling it, but when LLM emerged, it turned out to be the best use case for the RAG technique for LLMs. We proposed a method that generates a ruleset based on defined knowledge patterns, then converts rules into text form via a rule-to-text converter, and includes the result as an RAG into LLM. We compared this method with ChatGPT (even with using agents) and we have discovered a significant improvement in answering questions based on the dataset. We have also tried several strategies how much rules to generate. We found this improvement interesting. Moreover, it can also be improved in many ways as future work, like incorporating other patterns, the use of rule mining as an agent, and many others.
Related papers
- GenKI: Enhancing Open-Domain Question Answering with Knowledge Integration and Controllable Generation in Large Language Models [75.25348392263676]
Open-domain question answering (OpenQA) represents a cornerstone in natural language processing (NLP)<n>We propose a novel framework named GenKI, which aims to improve the OpenQA performance by exploring Knowledge Integration and controllable Generation.
arXiv Detail & Related papers (2025-05-26T08:18:33Z) - Detecting Knowledge Boundary of Vision Large Language Models by Sampling-Based Inference [78.08901120841833]
We propose a method to detect the knowledge boundary of Visual Large Language Models (VLLMs)<n>We show that our method successfully depicts a VLLM's knowledge boundary based on which we are able to reduce indiscriminate retrieval while maintaining or improving the performance.
arXiv Detail & Related papers (2025-02-25T09:32:08Z) - RAG-DDR: Optimizing Retrieval-Augmented Generation Using Differentiable Data Rewards [78.74923079748521]
Retrieval-Augmented Generation (RAG) has proven its effectiveness in mitigating hallucinations in Large Language Models (LLMs) by retrieving knowledge from external resources.<n>Current approaches use instruction tuning to optimize LLMs, improving their ability to utilize retrieved knowledge.<n>We propose a Differentiable Data Rewards ( DDR) method, which trains RAG systems by aligning data preferences between different RAG modules.
arXiv Detail & Related papers (2024-10-17T12:53:29Z) - Steering Large Language Models between Code Execution and Textual Reasoning [22.279107036500083]
Textual reasoning has inherent limitations in solving tasks with challenges in math, logics, optimization, and searching.<n>OpenAI GPT Code Interpreter and multi-agent frameworks such as AutoGen have demonstrated remarkable proficiency of integrating code generation and execution.<n>We propose three methods to better steer LLM code/text generation and achieve a notable improvement.
arXiv Detail & Related papers (2024-10-04T15:44:47Z) - Symbolic Working Memory Enhances Language Models for Complex Rule Application [87.34281749422756]
Large Language Models (LLMs) have shown remarkable reasoning performance but struggle with multi-step deductive reasoning.
We propose augmenting LLMs with external working memory and introduce a neurosymbolic framework for rule application.
Our framework iteratively performs symbolic rule grounding and LLM-based rule implementation.
arXiv Detail & Related papers (2024-08-24T19:11:54Z) - RuleR: Improving LLM Controllability by Rule-based Data Recycling [28.74786215922553]
Rule-based Data Recycling (RuleR) is a data augmentation method incorporating multiple constraints into the original data samples according to predefined rules.<n>Instead of creating new data from scratch, RuleR "recycles" existing data by simply applying rule-based edits to their responses and appending the rule-instructions in their original instructions.<n> Experimental results demonstrate RuleR's effectiveness in improving LLM controllability while maintaining general instruction-following capabilities.
arXiv Detail & Related papers (2024-06-22T20:57:12Z) - Multi-Meta-RAG: Improving RAG for Multi-Hop Queries using Database Filtering with LLM-Extracted Metadata [1.6574413179773757]
Retrieval-augmented generation (RAG) enables retrieval of relevant information from an external knowledge source.
Traditional RAG applications perform poorly in answering multi-hop questions.
We introduce a new method called Multi-Meta-RAG, which uses database filtering with LLM-extracted metadata.
arXiv Detail & Related papers (2024-06-19T04:53:48Z) - RAFT: Adapting Language Model to Domain Specific RAG [75.63623523051491]
We present Retrieval Augmented FineTuning (RAFT), a training recipe that improves the model's ability to answer questions in a "openbook" in-domain settings.
RAFT accomplishes this by citing the verbatim right sequence from the relevant document that would help answer the question.
RAFT consistently improves the model's performance across PubMed, HotpotQA, and Gorilla datasets.
arXiv Detail & Related papers (2024-03-15T09:26:02Z) - Reinforcement Learning for Optimizing RAG for Domain Chatbots [4.12484724941528]
This paper describes a RAG-based approach for building a bot that answers user's queries using Frequently Asked Questions (FAQ) data.
We train an in-house retrieval embedding model using infoNCE loss, and experimental results demonstrate that the in-house model works significantly better than the well-known general-purpose public embedding model.
We propose a policy-based model external to the RAG, which interacts with the RAG pipeline through policy actions and updates the policy to optimize the cost.
arXiv Detail & Related papers (2024-01-10T02:57:20Z) - Distilling Rule-based Knowledge into Large Language Models [90.7765003679106]
We are inspired that humans can learn the new tasks or knowledge in another way by learning from rules.<n>We propose rule distillation, which first uses the strong in-context abilities of LLMs to extract the knowledge from the textual rules.<n>Our experiments show that making LLMs learn from rules by our method is much more efficient than example-based learning in both the sample size and generalization ability.
arXiv Detail & Related papers (2023-11-15T11:42:41Z) - One Model for All: Large Language Models are Domain-Agnostic Recommendation Systems [43.79001185418127]
This paper introduces a framework that utilizes pre-trained large language models (LLMs) for domain-agnostic recommendation.<n>Specifically, we mix user's behaviors from multiple domains and item titles into a sentence, then use LLMs for generating user and item representations.
arXiv Detail & Related papers (2023-10-22T13:56:14Z) - Allies: Prompting Large Language Model with Beam Search [107.38790111856761]
In this work, we propose a novel method called ALLIES.
Given an input query, ALLIES leverages LLMs to iteratively generate new queries related to the original query.
By iteratively refining and expanding the scope of the original query, ALLIES captures and utilizes hidden knowledge that may not be directly through retrieval.
arXiv Detail & Related papers (2023-05-24T06:16:44Z) - Repository-Level Prompt Generation for Large Language Models of Code [28.98699307030983]
We propose a framework that learns to generate example-specific prompts using prompt proposals.
The prompt proposals take context from the entire repository.
We conduct experiments on the task of single-line code-autocompletion using code repositories taken from Google Code archives.
arXiv Detail & Related papers (2022-06-26T10:51:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.