NLP at UC Santa Cruz at SemEval-2024 Task 5: Legal Answer Validation using Few-Shot Multi-Choice QA
- URL: http://arxiv.org/abs/2404.03150v1
- Date: Thu, 4 Apr 2024 01:50:20 GMT
- Title: NLP at UC Santa Cruz at SemEval-2024 Task 5: Legal Answer Validation using Few-Shot Multi-Choice QA
- Authors: Anish Pahilajani, Samyak Rajesh Jain, Devasha Trivedi,
- Abstract summary: We present two approaches to solving the task of legal answer validation.
First, we fine-tuned pre-trained BERT-based models and found that models trained on domain knowledge perform better.
Second, we performed few-shot prompting on GPT models and found that reformulating the answer validation task to be a multiple-choice QA task remarkably improves the performance of the model.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: This paper presents our submission to the SemEval 2024 Task 5: The Legal Argument Reasoning Task in Civil Procedure. We present two approaches to solving the task of legal answer validation, given an introduction to the case, a question and an answer candidate. Firstly, we fine-tuned pre-trained BERT-based models and found that models trained on domain knowledge perform better. Secondly, we performed few-shot prompting on GPT models and found that reformulating the answer validation task to be a multiple-choice QA task remarkably improves the performance of the model. Our best submission is a BERT-based model that achieved the 7th place out of 20.
Related papers
- The Surprising Effectiveness of Test-Time Training for Abstract Reasoning [64.36534512742736]
We investigate the effectiveness of test-time training (TTT) as a mechanism for improving models' reasoning capabilities.
TTT significantly improves performance on ARC tasks, achieving up to 6x improvement in accuracy compared to base fine-tuned models.
Our findings suggest that explicit symbolic search is not the only path to improved abstract reasoning in neural language models.
arXiv Detail & Related papers (2024-11-11T18:59:45Z) - BAMO at SemEval-2024 Task 9: BRAINTEASER: A Novel Task Defying Common Sense [0.04096453902709291]
This paper outlines our approach to SemEval 2024 Task 9, BRAINTEASER: A Novel Task Defying Common Sense.
The dataset comprises multi-choice questions that challenge models to think "outside of the box"
Our best method achieves an overall accuracy of 85 percent on the sentence puzzles subtask.
arXiv Detail & Related papers (2024-06-07T14:01:56Z) - iREL at SemEval-2024 Task 9: Improving Conventional Prompting Methods for Brain Teasers [11.819814280565142]
This paper describes our approach for SemEval-2024 Task 9: BRAINTEASER: A Novel Task Defying Common Sense.
The BRAINTEASER task comprises multiple-choice Question Answering designed to evaluate the models' lateral thinking capabilities.
We propose a unique strategy to improve the performance of pre-trained language models in both subtasks.
arXiv Detail & Related papers (2024-05-25T08:50:51Z) - NOWJ1@ALQAC 2023: Enhancing Legal Task Performance with Classic
Statistical Models and Pre-trained Language Models [4.329463429688995]
This paper describes the NOWJ1 Team's approach for the Automated Legal Question Answering Competition (ALQAC) 2023.
For the document retrieval task, we implement a pre-processing step to overcome input limitations and apply learning-to-rank methods to consolidate features from various models.
We incorporate state-of-the-art models to develop distinct systems for each sub-task, utilizing both classic statistical models and pre-trained Language Models.
arXiv Detail & Related papers (2023-09-16T18:32:15Z) - Understand Legal Documents with Contextualized Large Language Models [16.416510744265086]
We present our systems for SemEval-2023 Task 6: understanding legal texts.
We first develop the Legal-BERT-HSLN model that considers the comprehensive context information in both intra- and inter-sentence levels.
We then train a Legal-LUKE model, which is legal-contextualized and entity-aware, to recognize legal entities.
arXiv Detail & Related papers (2023-03-21T18:48:11Z) - Socratic Pretraining: Question-Driven Pretraining for Controllable
Summarization [89.04537372465612]
Socratic pretraining is a question-driven, unsupervised pretraining objective designed to improve controllability in summarization tasks.
Our results show that Socratic pretraining cuts task-specific labeled data requirements in half.
arXiv Detail & Related papers (2022-12-20T17:27:10Z) - SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark
for Semantic and Generative Capabilities [76.97949110580703]
We introduce SUPERB-SG, a new benchmark to evaluate pre-trained models across various speech tasks.
We use a lightweight methodology to test the robustness of representations learned by pre-trained models under shifts in data domain.
We also show that the task diversity of SUPERB-SG coupled with limited task supervision is an effective recipe for evaluating the generalizability of model representation.
arXiv Detail & Related papers (2022-03-14T04:26:40Z) - The MultiBERTs: BERT Reproductions for Robustness Analysis [86.29162676103385]
Re-running pretraining can lead to substantially different conclusions about performance.
We introduce MultiBERTs: a set of 25 BERT-base checkpoints.
The aim is to enable researchers to draw robust and statistically justified conclusions about pretraining procedures.
arXiv Detail & Related papers (2021-06-30T15:56:44Z) - RECONSIDER: Re-Ranking using Span-Focused Cross-Attention for Open
Domain Question Answering [49.024513062811685]
We develop a simple and effective re-ranking approach (RECONSIDER) for span-extraction tasks.
RECONSIDER is trained on positive and negative examples extracted from high confidence predictions of MRC models.
It uses in-passage span annotations to perform span-focused re-ranking over a smaller candidate set.
arXiv Detail & Related papers (2020-10-21T04:28:42Z) - PRover: Proof Generation for Interpretable Reasoning over Rules [81.40404921232192]
We propose a transformer-based model that answers binary questions over rule-bases and generates the corresponding proofs.
Our model learns to predict nodes and edges corresponding to proof graphs in an efficient constrained training paradigm.
We conduct experiments on synthetic, hand-authored, and human-paraphrased rule-bases to show promising results for QA and proof generation.
arXiv Detail & Related papers (2020-10-06T15:47:53Z) - IIE-NLP-NUT at SemEval-2020 Task 4: Guiding PLM with Prompt Template
Reconstruction Strategy for ComVE [13.334749848189826]
We formalize the subtasks into the multiple-choice question answering format and construct the input with the prompt templates.
Experimental results show that our approaches achieve significant performance compared with the baseline systems.
Our approaches secure the third rank on both official test sets of the first two subtasks with an accuracy of 96.4 and an accuracy of 94.3 respectively.
arXiv Detail & Related papers (2020-07-02T06:59:53Z) - Harvesting and Refining Question-Answer Pairs for Unsupervised QA [95.9105154311491]
We introduce two approaches to improve unsupervised Question Answering (QA)
First, we harvest lexically and syntactically divergent questions from Wikipedia to automatically construct a corpus of question-answer pairs (named as RefQA)
Second, we take advantage of the QA model to extract more appropriate answers, which iteratively refines data over RefQA.
arXiv Detail & Related papers (2020-05-06T15:56:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.