E-NER -- An Annotated Named Entity Recognition Corpus of Legal Text
- URL: http://arxiv.org/abs/2212.09306v1
- Date: Mon, 19 Dec 2022 09:03:32 GMT
- Title: E-NER -- An Annotated Named Entity Recognition Corpus of Legal Text
- Authors: Ting Wai Terence Au, Ingemar J. Cox, Vasileios Lampos
- Abstract summary: We describe a publicly available legal NER data set, called E-NER, based on legal company filings available from the US Securities and Exchange Commission's EDGAR data set.
Training a number of different NER algorithms on the general English CoNLL-2003 corpus but testing on our test collection confirmed significant degradations in accuracy.
- Score: 1.6221439565760059
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Identifying named entities such as a person, location or organization, in
documents can highlight key information to readers. Training Named Entity
Recognition (NER) models requires an annotated data set, which can be a
time-consuming labour-intensive task. Nevertheless, there are publicly
available NER data sets for general English. Recently there has been interest
in developing NER for legal text. However, prior work and experimental results
reported here indicate that there is a significant degradation in performance
when NER methods trained on a general English data set are applied to legal
text. We describe a publicly available legal NER data set, called E-NER, based
on legal company filings available from the US Securities and Exchange
Commission's EDGAR data set. Training a number of different NER algorithms on
the general English CoNLL-2003 corpus but testing on our test collection
confirmed significant degradations in accuracy, as measured by the F1-score, of
between 29.4\% and 60.4\%, compared to training and testing on the E-NER
collection.
Related papers
- Annotation Errors and NER: A Study with OntoNotes 5.0 [2.8544822698499255]
We employ three simple techniques to detect annotation errors in the OntoNotes 5.0 corpus for English NER.
Our techniques corrected 10% of the sentences in train/dev/test data.
We used three NER libraries to train, evaluate and compare the models trained with the original and the re-annotated datasets.
arXiv Detail & Related papers (2024-06-27T13:48:46Z) - LexMatcher: Dictionary-centric Data Collection for LLM-based Machine Translation [67.24113079928668]
We present LexMatcher, a method for data curation driven by the coverage of senses found in bilingual dictionaries.
Our approach outperforms the established baselines on the WMT2022 test sets.
arXiv Detail & Related papers (2024-06-03T15:30:36Z) - Named Entity Recognition via Machine Reading Comprehension: A Multi-Task
Learning Approach [50.12455129619845]
Named Entity Recognition (NER) aims to extract and classify entity mentions in the text into pre-defined types.
We propose to incorporate the label dependencies among entity types into a multi-task learning framework for better MRC-based NER.
arXiv Detail & Related papers (2023-09-20T03:15:05Z) - Disambiguation of Company names via Deep Recurrent Networks [101.90357454833845]
We propose a Siamese LSTM Network approach to extract -- via supervised learning -- an embedding of company name strings.
We analyse how an Active Learning approach to prioritise the samples to be labelled leads to a more efficient overall learning pipeline.
arXiv Detail & Related papers (2023-03-07T15:07:57Z) - T-NER: An All-Round Python Library for Transformer-based Named Entity
Recognition [9.928025283928282]
T-NER is a Python library for NER LM finetuning.
We show the potential of the library by compiling nine public NER datasets into a unified format.
To facilitate future research, we also release all our LM checkpoints via the Hugging Face model hub.
arXiv Detail & Related papers (2022-09-09T15:00:38Z) - Nested Named Entity Recognition as Holistic Structure Parsing [92.8397338250383]
This work models the full nested NEs in a sentence as a holistic structure, then we propose a holistic structure parsing algorithm to disclose the entire NEs once for all.
Experiments show that our model yields promising results on widely-used benchmarks which approach or even achieve state-of-the-art.
arXiv Detail & Related papers (2022-04-17T12:48:20Z) - MINER: Improving Out-of-Vocabulary Named Entity Recognition from an
Information Theoretic Perspective [57.19660234992812]
NER model has achieved promising performance on standard NER benchmarks.
Recent studies show that previous approaches may over-rely on entity mention information, resulting in poor performance on out-of-vocabulary (OOV) entity recognition.
We propose MINER, a novel NER learning framework, to remedy this issue from an information-theoretic perspective.
arXiv Detail & Related papers (2022-04-09T05:18:20Z) - Benchmarking Modern Named Entity Recognition Techniques for Free-text
Health Record De-identification [6.026640792312181]
Federal law restricts the sharing of any EHR data that contains protected health information (PHI)
This project explores several deep learning-based named entity recognition (NER) methods to determine which method(s) perform better on the de-identification task.
We trained and tested our models on the i2b2 training dataset, and qualitatively assessed their performance using EHR data collected from a local hospital.
arXiv Detail & Related papers (2021-03-25T01:26:58Z) - Named Entity Recognition in the Legal Domain using a Pointer Generator
Network [0.0]
We study the problem of legal NER with noisy text extracted from PDF files of filed court cases from US courts.
The exact location of the entities in the text is unknown and the entities may contain typos and/or OCR mistakes.
We formulate the NER task as a text-to-text sequence generation task and train a pointer generator network to generate the entities in the document rather than label them.
arXiv Detail & Related papers (2020-12-17T21:10:34Z) - Global Attention for Name Tagging [56.62059996864408]
We present a new framework to improve name tagging by utilizing local, document-level, and corpus-level contextual information.
We propose a model that learns to incorporate document-level and corpus-level contextual information alongside local contextual information via global attentions.
Experiments on benchmark datasets show the effectiveness of our approach.
arXiv Detail & Related papers (2020-10-19T07:27:15Z) - Exploring Cross-sentence Contexts for Named Entity Recognition with BERT [1.4998865865537996]
We present a study exploring the use of cross-sentence information for NER using BERT models in five languages.
We find that adding context in the form of additional sentences to BERT input increases NER performance on all of the tested languages and models.
We propose a straightforward method, Contextual Majority Voting (CMV), to combine different predictions for sentences and demonstrate this to further increase NER performance with BERT.
arXiv Detail & Related papers (2020-06-02T12:34:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.