Bridging Legal Knowledge and AI: Retrieval-Augmented Generation with Vector Stores, Knowledge Graphs, and Hierarchical Non-negative Matrix Factorization
- URL: http://arxiv.org/abs/2502.20364v2
- Date: Fri, 09 May 2025 00:25:09 GMT
- Title: Bridging Legal Knowledge and AI: Retrieval-Augmented Generation with Vector Stores, Knowledge Graphs, and Hierarchical Non-negative Matrix Factorization
- Authors: Ryan C. Barron, Maksim E. Eren, Olga M. Serafimova, Cynthia Matuszek, Boian S. Alexandrov,
- Abstract summary: Agentic Generative AI, powered by Large Language Models (LLMs) with Retrieval-Augmented Generation (RAG), Knowledge Graphs (KGs), and Vector Stores (VSs)<n>This technology excels at inferring relationships within vast unstructured or semi-structured datasets.<n>We introduce a generative AI system that integrates RAG, VS, and KG, constructed via Non-Negative Matrix Factorization (NMF)
- Score: 6.0045906216050815
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Agentic Generative AI, powered by Large Language Models (LLMs) with Retrieval-Augmented Generation (RAG), Knowledge Graphs (KGs), and Vector Stores (VSs), represents a transformative technology applicable to specialized domains such as legal systems, research, recommender systems, cybersecurity, and global security, including proliferation research. This technology excels at inferring relationships within vast unstructured or semi-structured datasets. The legal domain here comprises complex data characterized by extensive, interrelated, and semi-structured knowledge systems with complex relations. It comprises constitutions, statutes, regulations, and case law. Extracting insights and navigating the intricate networks of legal documents and their relations is crucial for effective legal research. Here, we introduce a generative AI system that integrates RAG, VS, and KG, constructed via Non-Negative Matrix Factorization (NMF), to enhance legal information retrieval and AI reasoning and minimize hallucinations. In the legal system, these technologies empower AI agents to identify and analyze complex connections among cases, statutes, and legal precedents, uncovering hidden relationships and predicting legal trends-challenging tasks that are essential for ensuring justice and improving operational efficiency. Our system employs web scraping techniques to systematically collect legal texts, such as statutes, constitutional provisions, and case law, from publicly accessible platforms like Justia. It bridges the gap between traditional keyword-based searches and contextual understanding by leveraging advanced semantic representations, hierarchical relationships, and latent topic discovery. This framework supports legal document clustering, summarization, and cross-referencing, for scalable, interpretable, and accurate retrieval for semi-structured data while advancing computational law and AI.
Related papers
- Policy-Driven AI in Dataspaces: Taxonomy, Explainability, and Pathways for Compliant Innovation [1.6766200616088744]
This paper provides a comprehensive review of privacy-preserving and policy-aware AI techniques.<n>We propose a novel taxonomy to classify these techniques based on privacy levels, impacts, and compliance complexity.<n>By technical, ethical, and regulatory perspectives, this work lays the groundwork for developing trustworthy, efficient, and compliant AI systems in dataspaces.
arXiv Detail & Related papers (2025-07-26T17:07:01Z) - When Large Language Models Meet Law: Dual-Lens Taxonomy, Technical Advances, and Ethical Governance [7.743029842436036]
This paper establishes the first comprehensive review of Large Language Models (LLMs)<n> Transformer-based LLMs exhibit emergent capabilities such as contextual reasoning and generative argumentation.<n>This review proposes a novel taxonomy that maps legal roles to computationally subtasks and implements the Toulmin argumentation framework.
arXiv Detail & Related papers (2025-07-10T13:26:34Z) - Rethinking Data Protection in the (Generative) Artificial Intelligence Era [115.71019708491386]
We propose a four-level taxonomy that captures the diverse protection needs arising in modern (generative) AI models and systems.<n>Our framework offers a structured understanding of the trade-offs between data utility and control, spanning the entire AI pipeline.
arXiv Detail & Related papers (2025-07-03T02:45:51Z) - Deep Research Agents: A Systematic Examination And Roadmap [79.04813794804377]
Deep Research (DR) agents are designed to tackle complex, multi-turn informational research tasks.<n>In this paper, we conduct a detailed analysis of the foundational technologies and architectural components that constitute DR agents.
arXiv Detail & Related papers (2025-06-22T16:52:48Z) - Graphs Meet AI Agents: Taxonomy, Progress, and Future Opportunities [117.49715661395294]
Data structurization can play a promising role by transforming intricate and disorganized data into well-structured forms.<n>This survey presents a first systematic review of how graphs can empower AI agents.
arXiv Detail & Related papers (2025-06-22T12:59:12Z) - Retrieval Augmented Generation-based Large Language Models for Bridging Transportation Cybersecurity Legal Knowledge Gaps [14.261871331519567]
This study introduces a Retrieval-Augmented Generation (RAG) based Large Language Model (LLM) framework designed to support policymakers.<n>The framework focuses on reducing hallucinations in LLMs by using a curated set of domain-specific questions to guide response generation.<n>Our analysis shows that the proposed RAG-based LLM outperforms leading commercial LLMs across four evaluation metrics.
arXiv Detail & Related papers (2025-05-23T23:40:10Z) - Graph RAG for Legal Norms: A Hierarchical and Temporal Approach [0.0]
This article proposes an adaptation of Graph Retrieval Augmented Generation (Graph RAG) specifically designed for the analysis and comprehension of legal norms.
By combining structured knowledge graphs with contextually enriched text segments, Graph RAG offers a promising solution to address the inherent complexity and vast volume of legal data.
arXiv Detail & Related papers (2025-04-29T18:36:57Z) - Toward Agentic AI: Generative Information Retrieval Inspired Intelligent Communications and Networking [87.82985288731489]
Agentic AI has emerged as a key paradigm for intelligent communications and networking.<n>This article emphasizes the role of knowledge acquisition, processing, and retrieval in agentic AI for telecom systems.
arXiv Detail & Related papers (2025-02-24T06:02:25Z) - GeAR: Generation Augmented Retrieval [82.20696567697016]
Document retrieval techniques form the foundation for the development of large-scale information systems.<n>The prevailing methodology is to construct a bi-encoder and compute the semantic similarity.<n>We propose a new method called $textbfGe$neration that incorporates well-designed fusion and decoding modules.
arXiv Detail & Related papers (2025-01-06T05:29:00Z) - A Comprehensive Framework for Reliable Legal AI: Combining Specialized Expert Systems and Adaptive Refinement [0.0]
Article proposes a novel framework combining expert systems with a knowledge-based architecture to improve the precision and contextual relevance of AI-driven legal services.<n>This framework utilizes specialized modules, each focusing on specific legal areas, and incorporates structured operational guidelines to enhance decision-making.<n>The proposed approach demonstrates significant improvements over existing AI models, showcasing enhanced performance in legal tasks and offering a scalable solution to provide more accessible and affordable legal services.
arXiv Detail & Related papers (2024-12-29T14:00:11Z) - Unlocking Legal Knowledge with Multi-Layered Embedding-Based Retrieval [0.0]
We propose a multi-layered embedding-based retrieval method for legal and legislative texts.
Our method meets various information needs by allowing the Retrieval Augmented Generation system to provide accurate responses.
arXiv Detail & Related papers (2024-11-12T12:03:57Z) - Leveraging Knowledge Graphs and LLMs to Support and Monitor Legislative Systems [0.0]
This work investigates how Legislative Knowledge Graphs and LLMs can synergize and support legislative processes.
To this aim, we develop Legis AI Platform, an interactive platform focused on Italian legislation that enhances the possibility of conducting legislative analysis.
arXiv Detail & Related papers (2024-09-20T06:21:03Z) - Converging Paradigms: The Synergy of Symbolic and Connectionist AI in LLM-Empowered Autonomous Agents [55.63497537202751]
Article explores the convergence of connectionist and symbolic artificial intelligence (AI)
Traditionally, connectionist AI focuses on neural networks, while symbolic AI emphasizes symbolic representation and logic.
Recent advancements in large language models (LLMs) highlight the potential of connectionist architectures in handling human language as a form of symbols.
arXiv Detail & Related papers (2024-07-11T14:00:53Z) - Report of the 1st Workshop on Generative AI and Law [78.62063815165968]
This report presents the takeaways of the inaugural Workshop on Generative AI and Law (GenLaw)
A cross-disciplinary group of practitioners and scholars from computer science and law convened to discuss the technical, doctrinal, and policy challenges presented by law for Generative AI.
arXiv Detail & Related papers (2023-11-11T04:13:37Z) - Constructing a Knowledge Graph for Vietnamese Legal Cases with
Heterogeneous Graphs [5.168558598888541]
This paper presents a knowledge graph construction method for legal case documents and related laws.
Our approach consists of three main steps: data crawling, information extraction, and knowledge graph deployment.
arXiv Detail & Related papers (2023-09-16T18:31:47Z) - SAILER: Structure-aware Pre-trained Language Model for Legal Case
Retrieval [75.05173891207214]
Legal case retrieval plays a core role in the intelligent legal system.
Most existing language models have difficulty understanding the long-distance dependencies between different structures.
We propose a new Structure-Aware pre-traIned language model for LEgal case Retrieval.
arXiv Detail & Related papers (2023-04-22T10:47:01Z) - Finding the Law: Enhancing Statutory Article Retrieval via Graph Neural
Networks [3.5880535198436156]
We propose a novel graph-augmented dense statute retriever (G-DSR) model that incorporates the structure of legislation via a graph neural network to improve dense retrieval performance.
Experimental results show that our approach outperforms strong retrieval baselines on a real-world expert-annotated SAR dataset.
arXiv Detail & Related papers (2023-01-30T12:59:09Z) - Towards an Interface Description Template for AI-enabled Systems [77.34726150561087]
Reuse is a common system architecture approach that seeks to instantiate a system architecture with existing components.
There is currently no framework that guides the selection of necessary information to assess their portability to operate in a system different than the one for which the component was originally purposed.
We present ongoing work on establishing an interface description template that captures the main information of an AI-enabled component.
arXiv Detail & Related papers (2020-07-13T20:30:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.