LawngNLI: A Long-Premise Benchmark for In-Domain Generalization from
Short to Long Contexts and for Implication-Based Retrieval
- URL: http://arxiv.org/abs/2212.03222v1
- Date: Tue, 6 Dec 2022 18:42:39 GMT
- Title: LawngNLI: A Long-Premise Benchmark for In-Domain Generalization from
Short to Long Contexts and for Implication-Based Retrieval
- Authors: William Bruno, Dan Roth
- Abstract summary: LawngNLI is constructed from U.S. legal opinions with automatic labels with high human-validated accuracy.
It can benchmark for in-domain generalization from short to long contexts.
LawngNLI can train and test systems for implication-based case retrieval and argumentation.
- Score: 72.4859717204905
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Natural language inference has trended toward studying contexts beyond the
sentence level. An important application area is law: past cases often do not
foretell how they apply to new situations and implications must be inferred.
This paper introduces LawngNLI, constructed from U.S. legal opinions with
automatic labels with high human-validated accuracy. Premises are long and
multigranular. Experiments show two use cases. First, LawngNLI can benchmark
for in-domain generalization from short to long contexts. It has remained
unclear if large-scale long-premise NLI datasets actually need to be
constructed: near-top performance on long premises could be achievable by
fine-tuning using short premises. Without multigranularity, benchmarks cannot
distinguish lack of fine-tuning on long premises versus domain shift between
short and long datasets. In contrast, our long and short premises share the
same examples and domain. Models fine-tuned using several past NLI datasets
and/or our short premises fall short of top performance on our long premises.
So for at least certain domains (such as ours), large-scale long-premise
datasets are needed. Second, LawngNLI can benchmark for implication-based
retrieval. Queries are entailed or contradicted by target documents, allowing
users to move between arguments and evidence. Leading retrieval models perform
reasonably zero shot on a LawngNLI-derived retrieval task. We compare different
systems for re-ranking, including lexical overlap and cross-encoders fine-tuned
using a modified LawngNLI or past NLI datasets. LawngNLI can train and test
systems for implication-based case retrieval and argumentation.
Related papers
- Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks? [36.83397306207386]
We evaluate the capabilities of 17 leading Large Language Models (LLMs)
Strikingly, many models are remarkably threadsafe: capable of simultaneously following multiple threads without significant loss in performance.
We find the effective context limit is significantly shorter than the supported context length, with accuracy decreasing as the context window grows.
arXiv Detail & Related papers (2024-11-07T18:59:27Z) - Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA [71.04146366608904]
Long-context modeling capabilities have garnered widespread attention, leading to the emergence of Large Language Models (LLMs) with ultra-context windows.
We propose a novel long-context benchmark, Loong, aligning with realistic scenarios through extended multi-document question answering (QA)
Loong introduces four types of tasks with a range of context lengths: Spotlight Locating, Comparison, Clustering, and Chain of Reasoning.
arXiv Detail & Related papers (2024-06-25T09:42:56Z) - Long Context Alignment with Short Instructions and Synthesized Positions [56.1267385315404]
This paper introduces Step-Skipping Alignment (SkipAlign)
It is a new technique designed to enhance the long-context capabilities of Large Language Models (LLMs)
With a careful selection of the base model and alignment datasets, SkipAlign with only 6B parameters achieves it's best performance and comparable with strong baselines like GPT-3.5-Turbo-16K on LongBench.
arXiv Detail & Related papers (2024-05-07T01:56:22Z) - Entity Disambiguation with Entity Definitions [50.01142092276296]
Local models have recently attained astounding performances in Entity Disambiguation (ED)
Previous works limited their studies to using, as the textual representation of each candidate, only its Wikipedia title.
In this paper, we address this limitation and investigate to what extent more expressive textual representations can mitigate it.
We report a new state of the art on 2 out of 6 benchmarks we consider and strongly improve the generalization capability over unseen patterns.
arXiv Detail & Related papers (2022-10-11T17:46:28Z) - Stretching Sentence-pair NLI Models to Reason over Long Documents and
Clusters [35.103851212995046]
Natural Language Inference (NLI) has been extensively studied by the NLP community as a framework for estimating the semantic relation between sentence pairs.
We explore the direct zero-shot applicability of NLI models to real applications, beyond the sentence-pair setting they were trained on.
We develop new aggregation methods to allow operating over full documents, reaching state-of-the-art performance on the ContractNLI dataset.
arXiv Detail & Related papers (2022-04-15T12:56:39Z) - A Closer Look at Debiased Temporal Sentence Grounding in Videos:
Dataset, Metric, and Approach [53.727460222955266]
Temporal Sentence Grounding in Videos (TSGV) aims to ground a natural language sentence in an untrimmed video.
Recent studies have found that current benchmark datasets may have obvious moment annotation biases.
We introduce a new evaluation metric "dR@n,IoU@m" that discounts the basic recall scores to alleviate the inflating evaluation caused by biased datasets.
arXiv Detail & Related papers (2022-03-10T08:58:18Z) - DocNLI: A Large-scale Dataset for Document-level Natural Language
Inference [55.868482696821815]
Natural language inference (NLI) is formulated as a unified framework for solving various NLP problems.
This work presents DocNLI -- a newly-constructed large-scale dataset for document-level NLI.
arXiv Detail & Related papers (2021-06-17T13:02:26Z) - Natural Language Inference in Context -- Investigating Contextual
Reasoning over Long Texts [19.894104911338353]
ConTRoL is a new dataset for ConTextual Reasoning over Long texts.
It consists of 8,325 expert-designed "context-hypothesis" pairs with gold labels.
It is derived from competitive selection and recruitment test (verbal reasoning test) for police recruitment, with expert level quality.
arXiv Detail & Related papers (2020-11-10T02:31:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.