A Statutory Article Retrieval Dataset in French
- URL: http://arxiv.org/abs/2108.11792v1
- Date: Thu, 26 Aug 2021 13:50:20 GMT
- Title: A Statutory Article Retrieval Dataset in French
- Authors: Antoine Louis, Gerasimos Spanakis, Gijs Van Dijck
- Abstract summary: We introduce the Belgian Statutory Article Retrieval dataset (BSARD)
BSARD consists of 1,100+ French native legal questions labeled by experienced jurists with relevant articles from a corpus of 22,600+ Belgian law articles.
We benchmark several unsupervised information retrieval methods based on term weighting and pooled embeddings.
Our best performing baseline achieves 50.8% R@100, which is promising for the feasibility of the task and indicates that there is still substantial room for improvement.
- Score: 4.082216579462797
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Statutory article retrieval is the task of automatically retrieving law
articles relevant to a legal question. While recent advances in natural
language processing have sparked considerable interest in many legal tasks,
statutory article retrieval remains primarily untouched due to the scarcity of
large-scale and high-quality annotated datasets. To address this bottleneck, we
introduce the Belgian Statutory Article Retrieval Dataset (BSARD), which
consists of 1,100+ French native legal questions labeled by experienced jurists
with relevant articles from a corpus of 22,600+ Belgian law articles. Using
BSARD, we benchmark several unsupervised information retrieval methods based on
term weighting and pooled embeddings. Our best performing baseline achieves
50.8% R@100, which is promising for the feasibility of the task and indicates
that there is still substantial room for improvement. By the specificity of the
data domain and addressed task, BSARD presents a unique challenge problem for
future research on legal information retrieval.
Related papers
- Exploiting LLMs' Reasoning Capability to Infer Implicit Concepts in Legal Information Retrieval [6.952344923975001]
This work focuses on utilizing the logical reasoning capabilities of large language models (LLMs) to identify relevant legal terms.
The proposed retrieval system integrates additional information from the term--based expansion and query reformulation to improve the retrieval accuracy.
Experiments on COLIEE 2022 and COLIEE 2023 datasets show that extra knowledge from LLMs helps to improve the retrieval result of both lexical and semantic ranking models.
arXiv Detail & Related papers (2024-10-16T01:34:14Z) - BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval [54.54576644403115]
Many complex real-world queries require in-depth reasoning to identify relevant documents.
We introduce BRIGHT, the first text retrieval benchmark that requires intensive reasoning to retrieve relevant documents.
Our dataset consists of 1,384 real-world queries spanning diverse domains, such as economics, psychology, mathematics, and coding.
arXiv Detail & Related papers (2024-07-16T17:58:27Z) - STARD: A Chinese Statute Retrieval Dataset with Real Queries Issued by Non-professionals [14.002280587675175]
Statute retrieval aims to find relevant statutory articles for specific queries.
Existing statute retrieval benchmarks focus on formal and professional queries from sources like bar exams and legal case documents.
To address this gap, we introduce the STAtute Retrieval dataset (STARD)
Unlike existing statute retrieval datasets, STARD captures the complexity and diversity of real queries from the general public.
arXiv Detail & Related papers (2024-06-21T17:10:09Z) - InternLM-Law: An Open Source Chinese Legal Large Language Model [72.2589401309848]
InternLM-Law is a specialized LLM tailored for addressing diverse legal queries related to Chinese laws.
We meticulously construct a dataset in the Chinese legal domain, encompassing over 1 million queries.
InternLM-Law achieves the highest average performance on LawBench, outperforming state-of-the-art models, including GPT-4, on 13 out of 20 subtasks.
arXiv Detail & Related papers (2024-06-21T06:19:03Z) - NeCo@ALQAC 2023: Legal Domain Knowledge Acquisition for Low-Resource
Languages through Data Enrichment [2.441072488254427]
This paper presents NeCo Team's solutions to the Vietnamese text processing tasks provided in the Automated Legal Question Answering Competition 2023 (ALQAC 2023)
Our methods for the legal document retrieval task employ a combination of similarity ranking and deep learning models, while for the second task, we propose a range of adaptive techniques to handle different question types.
Our approaches achieve outstanding results on both tasks of the competition, demonstrating the potential benefits and effectiveness of question answering systems in the legal field.
arXiv Detail & Related papers (2023-09-11T14:43:45Z) - Generating Natural Language Queries for More Effective Systematic Review
Screening Prioritisation [53.77226503675752]
The current state of the art uses the final title of the review as a query to rank the documents using BERT-based neural rankers.
In this paper, we explore alternative sources of queries for prioritising screening, such as the Boolean query used to retrieve the documents to be screened and queries generated by instruction-based large-scale language models such as ChatGPT and Alpaca.
Our best approach is not only viable based on the information available at the time of screening, but also has similar effectiveness to the final title.
arXiv Detail & Related papers (2023-09-11T05:12:14Z) - SAILER: Structure-aware Pre-trained Language Model for Legal Case
Retrieval [75.05173891207214]
Legal case retrieval plays a core role in the intelligent legal system.
Most existing language models have difficulty understanding the long-distance dependencies between different structures.
We propose a new Structure-Aware pre-traIned language model for LEgal case Retrieval.
arXiv Detail & Related papers (2023-04-22T10:47:01Z) - Finding the Law: Enhancing Statutory Article Retrieval via Graph Neural
Networks [3.5880535198436156]
We propose a novel graph-augmented dense statute retriever (G-DSR) model that incorporates the structure of legislation via a graph neural network to improve dense retrieval performance.
Experimental results show that our approach outperforms strong retrieval baselines on a real-world expert-annotated SAR dataset.
arXiv Detail & Related papers (2023-01-30T12:59:09Z) - Retrieval Enhanced Data Augmentation for Question Answering on Privacy
Policies [74.01792675564218]
We develop a data augmentation framework based on ensembling retriever models that captures relevant text segments from unlabeled policy documents.
To improve the diversity and quality of the augmented data, we leverage multiple pre-trained language models (LMs) and cascade them with noise reduction filter models.
Using our augmented data on the PrivacyQA benchmark, we elevate the existing baseline by a large margin (10% F1) and achieve a new state-of-the-art F1 score of 50%.
arXiv Detail & Related papers (2022-04-19T15:45:23Z) - Case law retrieval: problems, methods, challenges and evaluations in the
last 20 years [23.13408774493739]
We survey methods for case law retrieval from the past 20 years.
We outline the problems and challenges facing evaluation of case law retrieval systems going forward.
arXiv Detail & Related papers (2022-02-15T06:01:36Z) - Algorithmic Fairness Datasets: the Story so Far [68.45921483094705]
Data-driven algorithms are studied in diverse domains to support critical decisions, directly impacting people's well-being.
A growing community of researchers has been investigating the equity of existing algorithms and proposing novel ones, advancing the understanding of risks and opportunities of automated decision-making for historically disadvantaged populations.
Progress in fair Machine Learning hinges on data, which can be appropriately used only if adequately documented.
Unfortunately, the algorithmic fairness community suffers from a collective data documentation debt caused by a lack of information on specific resources (opacity) and scatteredness of available information (sparsity)
arXiv Detail & Related papers (2022-02-03T17:25:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.