Related papers: Unlocking Legal Knowledge with Multi-Layered Embedding-Based Retrieval

Unlocking Legal Knowledge with Multi-Layered Embedding-Based Retrieval

URL: http://arxiv.org/abs/2411.07739v1
Date: Tue, 12 Nov 2024 12:03:57 GMT
Title: Unlocking Legal Knowledge with Multi-Layered Embedding-Based Retrieval
Authors: João Alberto de Oliveira Lima,
Abstract summary: We propose a multi-layered embedding-based retrieval method for legal and legislative texts. Our method meets various information needs by allowing the Retrieval Augmented Generation system to provide accurate responses.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: This work addresses the challenge of capturing the complexities of legal knowledge by proposing a multi-layered embedding-based retrieval method for legal and legislative texts. Creating embeddings not only for individual articles but also for their components (paragraphs, clauses) and structural groupings (books, titles, chapters, etc), we seek to capture the subtleties of legal information through the use of dense vectors of embeddings, representing it at varying levels of granularity. Our method meets various information needs by allowing the Retrieval Augmented Generation system to provide accurate responses, whether for specific segments or entire sections, tailored to the user's query. We explore the concepts of aboutness, semantic chunking, and inherent hierarchy within legal texts, arguing that this method enhances the legal information retrieval. Despite the focus being on Brazil's legislative methods and the Brazilian Constitution, which follow a civil law tradition, our findings should in principle be applicable across different legal systems, including those adhering to common law traditions. Furthermore, the principles of the proposed method extend beyond the legal domain, offering valuable insights for organizing and retrieving information in any field characterized by information encoded in hierarchical text.

Related papers

LegalOne: A Family of Foundation Models for Reliable Legal Reasoning [54.57434222018289]
We present LegalOne, a family of foundational models specifically tailored for the Chinese legal domain.<n>LegalOne is developed through a comprehensive three-phase pipeline designed to master legal reasoning.<n>We publicly release the LegalOne weights and the LegalKit evaluation framework to advance the field of Legal AI.
arXiv Detail & Related papers (2026-01-31T10:18:32Z)
Poly-Vector Retrieval: Reference and Content Embeddings for Legal Documents [0.0]
In legal contexts, users frequently reference norms by their labels or nicknames, rather than by their content. This paper introduces Poly-Retrieval, assigning multiple distinct embeddings to each legal provision. It significantly improves retrieval accuracy for label-centric queries and potential to resolve internal and external cross-references.
arXiv Detail & Related papers (2025-04-09T17:54:11Z)
Bridging Textual-Collaborative Gap through Semantic Codes for Sequential Recommendation [91.13055384151897]
CCFRec is a novel Code-based textual and Collaborative semantic Fusion method for sequential Recommendation.<n>We generate fine-grained semantic codes from multi-view text embeddings through vector quantization techniques.<n>In order to further enhance the fusion of textual and collaborative semantics, we introduce an optimization strategy.
arXiv Detail & Related papers (2025-03-15T15:54:44Z)
A Law Reasoning Benchmark for LLM with Tree-Organized Structures including Factum Probandum, Evidence and Experiences [76.73731245899454]
We propose a transparent law reasoning schema enriched with hierarchical factum probandum, evidence, and implicit experience. Inspired by this schema, we introduce the challenging task, which takes a textual case description and outputs a hierarchical structure justifying the final decision. This benchmark paves the way for transparent and accountable AI-assisted law reasoning in the Intelligent Court''
arXiv Detail & Related papers (2025-03-02T10:26:54Z)
Bridging Legal Knowledge and AI: Retrieval-Augmented Generation with Vector Stores, Knowledge Graphs, and Hierarchical Non-negative Matrix Factorization [6.0045906216050815]
Agentic Generative AI, powered by Large Language Models (LLMs) with Retrieval-Augmented Generation (RAG), Knowledge Graphs (KGs), and Vector Stores (VSs) This technology excels at inferring relationships within vast unstructured or semi-structured datasets. We introduce a generative AI system that integrates RAG, VS, and KG, constructed via Non-Negative Matrix Factorization (NMF)
arXiv Detail & Related papers (2025-02-27T18:35:39Z)
The Use of Readability Metrics in Legal Text: A Systematic Literature Review [3.439579933384111]
Linguistic complexity is an important contributor to difficulties experienced by readers. Document readability metrics have been developed to measure document readability. Not all legal domains are well represented in terms of readability metrics.
arXiv Detail & Related papers (2024-11-14T15:04:17Z)
A Multi-Source Heterogeneous Knowledge Injected Prompt Learning Method for Legal Charge Prediction [3.52209555388364]
We propose a prompt learning framework-based method for modeling case descriptions. We leverage multi-source external knowledge from a legal knowledge base, a conversational LLM, and legal articles. Our method achieves state-of-the-art results on CAIL-2018, the largest legal charge prediction dataset.
arXiv Detail & Related papers (2024-08-05T04:53:17Z)
DELTA: Pre-train a Discriminative Encoder for Legal Case Retrieval via Structural Word Alignment [55.91429725404988]
We introduce DELTA, a discriminative model designed for legal case retrieval. We leverage shallow decoders to create information bottlenecks, aiming to enhance the representation ability. Our approach can outperform existing state-of-the-art methods in legal case retrieval.
arXiv Detail & Related papers (2024-03-27T10:40:14Z)
A Deep Learning-Based System for Automatic Case Summarization [2.9141777969894966]
This paper presents a deep learning-based system for efficient automatic case summarization. The system offers both supervised and unsupervised methods to generate concise and relevant summaries of lengthy legal case documents. Future work will focus on refining summarization techniques and exploring the application of our methods to other types of legal texts.
arXiv Detail & Related papers (2023-12-13T01:18:10Z)
Large Language Models and Explainable Law: a Hybrid Methodology [44.99833362998488]
The paper advocates for LLMs to enhance the accessibility, usage and explainability of rule-based legal systems. A methodology is developed to explore the potential use of LLMs for translating the explanations produced by rule-based systems.
arXiv Detail & Related papers (2023-11-20T14:47:20Z)
MUSER: A Multi-View Similar Case Retrieval Dataset [65.36779942237357]
Similar case retrieval (SCR) is a representative legal AI application that plays a pivotal role in promoting judicial fairness. Existing SCR datasets only focus on the fact description section when judging the similarity between cases. We present M, a similar case retrieval dataset based on multi-view similarity measurement and comprehensive legal element with sentence-level legal element annotations.
arXiv Detail & Related papers (2023-10-24T08:17:11Z)
Constructing a Knowledge Graph for Vietnamese Legal Cases with Heterogeneous Graphs [5.168558598888541]
This paper presents a knowledge graph construction method for legal case documents and related laws. Our approach consists of three main steps: data crawling, information extraction, and knowledge graph deployment.
arXiv Detail & Related papers (2023-09-16T18:31:47Z)
SAILER: Structure-aware Pre-trained Language Model for Legal Case Retrieval [75.05173891207214]
Legal case retrieval plays a core role in the intelligent legal system. Most existing language models have difficulty understanding the long-distance dependencies between different structures. We propose a new Structure-Aware pre-traIned language model for LEgal case Retrieval.
arXiv Detail & Related papers (2023-04-22T10:47:01Z)
PropSegmEnt: A Large-Scale Corpus for Proposition-Level Segmentation and Entailment Recognition [63.51569687229681]
We argue for the need to recognize the textual entailment relation of each proposition in a sentence individually. We propose PropSegmEnt, a corpus of over 45K propositions annotated by expert human raters. Our dataset structure resembles the tasks of (1) segmenting sentences within a document to the set of propositions, and (2) classifying the entailment relation of each proposition with respect to a different yet topically-aligned document.
arXiv Detail & Related papers (2022-12-21T04:03:33Z)
Computing and Exploiting Document Structure to Improve Unsupervised Extractive Summarization of Legal Case Decisions [7.99536002595393]
We propose an unsupervised graph-based ranking model that uses a reweighting algorithm to exploit document structure. Results on the Canadian Legal Case Law dataset show that our proposed method outperforms several strong baselines.
arXiv Detail & Related papers (2022-11-06T22:20:42Z)
Entity Graph Extraction from Legal Acts -- a Prototype for a Use Case in Policy Design Analysis [52.77024349608834]
This paper presents a prototype developed to serve the quantitative study of public policy design. Our system aims to automate the process of gathering legal documents, annotating them with Institutional Grammar, and using hypergraphs to analyse inter-relations between crucial entities.
arXiv Detail & Related papers (2022-09-02T10:57:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.