Related papers: On the Suitability of pre-trained foundational LLMs for Analysis in German Legal Education

On the Suitability of pre-trained foundational LLMs for Analysis in German Legal Education

URL: http://arxiv.org/abs/2412.15902v1
Date: Fri, 20 Dec 2024 13:54:57 GMT
Title: On the Suitability of pre-trained foundational LLMs for Analysis in German Legal Education
Authors: Lorenz Wendlinger, Christian Braun, Abdullah Al Zubaer, Simon Alexander Nonn, Sarah Großkopf, Christofer Fellicious, Michael Granitzer,
Abstract summary: We show that current open-source foundational LLMs possess instruction capability and German legal background knowledge that is sufficient for some legal analysis in an educational context.<n>However, model capability breaks down in very specific tasks, such as the classification of "Gutachtenstil" appraisal style components.<n>We introduce a Retrieval Augmented Generation based prompt example selection method that substantially improves predictions in high data availability scenarios.
Score: 1.7977968161686195
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: We show that current open-source foundational LLMs possess instruction capability and German legal background knowledge that is sufficient for some legal analysis in an educational context. However, model capability breaks down in very specific tasks, such as the classification of "Gutachtenstil" appraisal style components, or with complex contexts, such as complete legal opinions. Even with extended context and effective prompting strategies, they cannot match the Bag-of-Words baseline. To combat this, we introduce a Retrieval Augmented Generation based prompt example selection method that substantially improves predictions in high data availability scenarios. We further evaluate the performance of pre-trained LLMs on two standard tasks for argument mining and automated essay scoring and find it to be more adequate. Throughout, pre-trained LLMs improve upon the baseline in scenarios with little or no labeled data with Chain-of-Thought prompting further helping in the zero-shot case.

Related papers

Revisiting LLM Reasoning via Information Bottleneck [57.519119962528166]
Large language models (LLMs) have recently demonstrated remarkable progress in reasoning capabilities through reinforcement learning with verifiable rewards (RLVR)<n>We present a theoretical characterization of LLM reasoning grounded in information bottleneck (IB) principle.<n>We propose IB-aware reasoning optimization (IBRO), a framework that encourages reasoning trajectories to be both informative about the final correct answer and generalizable.
arXiv Detail & Related papers (2025-07-24T13:14:25Z)
Large Language Models in Argument Mining: A Survey [15.041650203089057]
Argument Mining (AM) focuses on extracting argumentative structures from text.<n>The advent of Large Language Models (LLMs) has profoundly transformed AM, enabling advanced in-context learning.<n>This survey systematically synthesizes recent advancements in LLM-driven AM.
arXiv Detail & Related papers (2025-06-19T15:12:58Z)
Contextualizing Search Queries In-Context Learning for Conversational Rewriting with LLMs [0.0]
This paper introduces Prompt-Guided In-Context Learning, a novel approach for few-shot conversational query rewriting. Our method employs carefully designed prompts, incorporating task descriptions, input/output format specifications, and a small set of illustrative examples. Experiments on benchmark datasets, TREC and Taskmaster-1, demonstrate that our approach significantly outperforms strong baselines.
arXiv Detail & Related papers (2025-02-20T20:02:42Z)
Investigating the Shortcomings of LLMs in Step-by-Step Legal Reasoning [34.427730009102966]
We develop an automated evaluation framework to identify reasoning errors and evaluate the performance of LLMs. Our work will also serve as an evaluation framework that can be used in detailed error analysis of reasoning chains for logic-intensive complex tasks.
arXiv Detail & Related papers (2025-02-08T19:49:32Z)
Methods for Legal Citation Prediction in the Age of LLMs: An Australian Law Case Study [9.30538764385435]
We focus on the problem of legal citation prediction within the Australian law context, where correctly identifying and citing relevant legislations or precedents is critical.<n>Our findings indicate that domain-specific pre-training alone is insufficient for achieving satisfactory citation accuracy even after law-specialised pre-training.<n>In contrast, instruction tuning on our task-specific dataset dramatically boosts performance reaching the best results across all settings.
arXiv Detail & Related papers (2024-12-09T07:46:14Z)
Context is Key: A Benchmark for Forecasting with Essential Textual Information [87.3175915185287]
"Context is Key" (CiK) is a forecasting benchmark that pairs numerical data with diverse types of carefully crafted textual context. We evaluate a range of approaches, including statistical models, time series foundation models, and LLM-based forecasters. We propose a simple yet effective LLM prompting method that outperforms all other tested methods on our benchmark.
arXiv Detail & Related papers (2024-10-24T17:56:08Z)
A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs [74.35290684163718]
A primary challenge in large language model (LLM) development is their onerous pre-training cost. This paper explores a promising paradigm to improve LLM pre-training efficiency and quality by leveraging a small language model (SLM)
arXiv Detail & Related papers (2024-10-24T14:31:52Z)
Zero-shot Model-based Reinforcement Learning using Large Language Models [12.930241182192988]
We investigate how pre-trained Large Language Models can be leveraged to predict in context the dynamics of continuous Markov decision processes. We present proof-of-concept applications in two reinforcement learning settings: model-based policy evaluation and data-augmented off-policy reinforcement learning.
arXiv Detail & Related papers (2024-10-15T15:46:53Z)
RepEval: Effective Text Evaluation with LLM Representation [55.26340302485898]
RepEval is a metric that leverages the projection of Large Language Models (LLMs) representations for evaluation. Our work underscores the richness of information regarding text quality embedded within LLM representations, offering insights for the development of new metrics.
arXiv Detail & Related papers (2024-04-30T13:50:55Z)
Unveiling the Misuse Potential of Base Large Language Models via In-Context Learning [61.2224355547598]
Open-sourcing of large language models (LLMs) accelerates application development, innovation, and scientific progress. Our investigation exposes a critical oversight in this belief. By deploying carefully designed demonstrations, our research demonstrates that base LLMs could effectively interpret and execute malicious instructions.
arXiv Detail & Related papers (2024-04-16T13:22:54Z)
Self-Augmented In-Context Learning for Unsupervised Word Translation [23.495503962839337]
Large language models (LLMs) demonstrate strong word translation or bilingual lexicon induction (BLI) capabilities in few-shot setups. We propose self-augmented in-context learning (SAIL) for unsupervised BLI. Our method shows substantial gains over zero-shot prompting of LLMs on two established BLI benchmarks.
arXiv Detail & Related papers (2024-02-15T15:43:05Z)
Learning to Generate Explainable Stock Predictions using Self-Reflective Large Language Models [54.21695754082441]
We propose a framework to teach Large Language Models (LLMs) to generate explainable stock predictions. A reflective agent learns how to explain past stock movements through self-reasoning, while the PPO trainer trains the model to generate the most likely explanations. Our framework can outperform both traditional deep-learning and LLM methods in prediction accuracy and Matthews correlation coefficient.
arXiv Detail & Related papers (2024-02-06T03:18:58Z)
Reinforcement Replaces Supervision: Query focused Summarization using Deep Reinforcement Learning [43.123290672073814]
We deal with systems that generate summaries from document(s) based on a query. Motivated by the insight that Reinforcement Learning (RL) provides a generalization to Supervised Learning (SL) for Natural Language Generation, we use an RL-based approach for this task. We develop multiple Policy Gradient networks, trained on various reward signals: ROUGE, BLEU, and Semantic Similarity.
arXiv Detail & Related papers (2023-11-29T10:38:16Z)
A Comprehensive Evaluation of Large Language Models on Legal Judgment Prediction [60.70089334782383]
Large language models (LLMs) have demonstrated great potential for domain-specific applications. Recent disputes over GPT-4's law evaluation raise questions concerning their performance in real-world legal tasks. We design practical baseline solutions based on LLMs and test on the task of legal judgment prediction.
arXiv Detail & Related papers (2023-10-18T07:38:04Z)
Towards LogiGLUE: A Brief Survey and A Benchmark for Analyzing Logical Reasoning Capabilities of Language Models [56.34029644009297]
Large language models (LLMs) have demonstrated the ability to overcome various limitations of formal Knowledge Representation (KR) systems. LLMs excel most in abductive reasoning, followed by deductive reasoning, while they are least effective at inductive reasoning. We study single-task training, multi-task training, and "chain-of-thought" knowledge distillation fine-tuning technique to assess the performance of model.
arXiv Detail & Related papers (2023-10-02T01:00:50Z)
Through the Lens of Core Competency: Survey on Evaluation of Large Language Models [27.271533306818732]
Large language model (LLM) has excellent performance and wide practical uses. Existing evaluation tasks are difficult to keep up with the wide range of applications in real-world scenarios. We summarize 4 core competencies of LLM, including reasoning, knowledge, reliability, and safety. Under this competency architecture, similar tasks are combined to reflect corresponding ability, while new tasks can also be easily added into the system.
arXiv Detail & Related papers (2023-08-15T17:40:34Z)
Large Language Models as Tax Attorneys: A Case Study in Legal Capabilities Emergence [5.07013500385659]
This paper explores Large Language Models' (LLMs) capabilities in applying tax law. Our experiments demonstrate emerging legal understanding capabilities, with improved performance in each subsequent OpenAI model release. Findings indicate that LLMs, particularly when combined with prompting enhancements and the correct legal texts, can perform at high levels of accuracy but not yet at expert tax lawyer levels.
arXiv Detail & Related papers (2023-06-12T12:40:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.