A survey on recent advances in named entity recognition
- URL: http://arxiv.org/abs/2401.10825v1
- Date: Fri, 19 Jan 2024 17:21:05 GMT
- Title: A survey on recent advances in named entity recognition
- Authors: Imed Keraghel and Stanislas Morbieu and Mohamed Nadif
- Abstract summary: We present an overview of recent popular approaches to NER.
We also look at graph- and transformer-based methods including Large Language Models (LLMs)
We evaluate the performance of the main NER implementations on a variety of datasets with differing characteristics.
- Score: 10.02138130221506
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Named Entity Recognition seeks to extract substrings within a text that name
real-world objects and to determine their type (for example, whether they refer
to persons or organizations). In this survey, we first present an overview of
recent popular approaches, but we also look at graph- and transformer- based
methods including Large Language Models (LLMs) that have not had much coverage
in other surveys. Second, we focus on methods designed for datasets with scarce
annotations. Third, we evaluate the performance of the main NER implementations
on a variety of datasets with differing characteristics (as regards their
domain, their size, and their number of classes). We thus provide a deep
comparison of algorithms that are never considered together. Our experiments
shed some light on how the characteristics of datasets affect the behavior of
the methods that we compare.
Related papers
- Benchmarking a Benchmark: How Reliable is MS-COCO? [0.0]
Sama-COCO, a re-annotation of MS-COCO, is used to discover potential biases by leveraging a shape analysis pipeline.
A model is trained and evaluated on both datasets to examine the impact of different annotation conditions.
arXiv Detail & Related papers (2023-11-05T16:55:40Z) - infoVerse: A Universal Framework for Dataset Characterization with
Multidimensional Meta-information [68.76707843019886]
infoVerse is a universal framework for dataset characterization.
infoVerse captures multidimensional characteristics of datasets by incorporating various model-driven meta-information.
In three real-world applications (data pruning, active learning, and data annotation), the samples chosen on infoVerse space consistently outperform strong baselines.
arXiv Detail & Related papers (2023-05-30T18:12:48Z) - Going beyond research datasets: Novel intent discovery in the industry
setting [60.90117614762879]
This paper proposes methods to improve the intent discovery pipeline deployed in a large e-commerce platform.
We show the benefit of pre-training language models on in-domain data: both self-supervised and with weak supervision.
We also devise the best method to utilize the conversational structure (i.e., question and answer) of real-life datasets during fine-tuning for clustering tasks, which we call Conv.
arXiv Detail & Related papers (2023-05-09T14:21:29Z) - DeepAstroUDA: Semi-Supervised Universal Domain Adaptation for
Cross-Survey Galaxy Morphology Classification and Anomaly Detection [0.0]
We present a universal domain adaptation method, textitDeepAstroUDA, as an approach to overcome this challenge.
textitDeepAstroUDA is capable of bridging the gap between two astronomical surveys, increasing classification accuracy in both domains.
Our method also performs well as an anomaly detection algorithm and successfully clusters unknown class samples even in the unlabeled target dataset.
arXiv Detail & Related papers (2023-02-03T21:20:58Z) - RGB-D-Based Categorical Object Pose and Shape Estimation: Methods,
Datasets, and Evaluation [5.71097144710995]
This work provides an overview of the field in terms of methods, datasets, and evaluation protocols.
We take a critical look at the predominant evaluation protocol, including metrics and datasets.
We propose a new set of metrics, contribute new annotations for the Redwood dataset, and evaluate state-of-the-art methods in a fair comparison.
arXiv Detail & Related papers (2023-01-19T15:59:10Z) - Revisiting Contrastive Methods for Unsupervised Learning of Visual
Representations [78.12377360145078]
Contrastive self-supervised learning has outperformed supervised pretraining on many downstream tasks like segmentation and object detection.
In this paper, we first study how biases in the dataset affect existing methods.
We show that current contrastive approaches work surprisingly well across: (i) object- versus scene-centric, (ii) uniform versus long-tailed and (iii) general versus domain-specific datasets.
arXiv Detail & Related papers (2021-06-10T17:59:13Z) - SegmentMeIfYouCan: A Benchmark for Anomaly Segmentation [111.61261419566908]
Deep neural networks (DNNs) are usually trained on a closed set of semantic classes.
They are ill-equipped to handle previously-unseen objects.
detecting and localizing such objects is crucial for safety-critical applications such as perception for automated driving.
arXiv Detail & Related papers (2021-04-30T07:58:19Z) - Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings.
We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data.
We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z) - Interpretable Multi-dataset Evaluation for Named Entity Recognition [110.64368106131062]
We present a general methodology for interpretable evaluation for the named entity recognition (NER) task.
The proposed evaluation method enables us to interpret the differences in models and datasets, as well as the interplay between them.
By making our analysis tool available, we make it easy for future researchers to run similar analyses and drive progress in this area.
arXiv Detail & Related papers (2020-11-13T10:53:27Z) - A Critical Assessment of State-of-the-Art in Entity Alignment [1.7725414095035827]
We investigate two state-of-the-art (SotA) methods for the task of Entity Alignment in Knowledge Graphs.
We first carefully examine the benchmarking process and identify several shortcomings, which make the results reported in the original works not always comparable.
arXiv Detail & Related papers (2020-10-30T15:09:19Z) - Sensor Data for Human Activity Recognition: Feature Representation and
Benchmarking [27.061240686613182]
The field of Human Activity Recognition (HAR) focuses on obtaining and analysing data captured from monitoring devices (e.g. sensors)
We address the issue of accurately recognising human activities using different Machine Learning (ML) techniques.
arXiv Detail & Related papers (2020-05-15T00:46:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.