Related papers: Small Language Model Makes an Effective Long Text Extractor

Small Language Model Makes an Effective Long Text Extractor

URL: http://arxiv.org/abs/2502.07286v1
Date: Tue, 11 Feb 2025 06:06:25 GMT
Title: Small Language Model Makes an Effective Long Text Extractor
Authors: Yelin Chen, Fanjin Zhang, Jie Tang,
Abstract summary: Named Entity Recognition (NER) is a fundamental problem in natural language processing (NLP)<n>This paper introduces a lightweight span-based NER method called SeNER.<n>It incorporates a bidirectional arrow attention mechanism coupled with LogN-Scaling on the [] token to embed long texts effectively.<n>It achieves state-of-the-art extraction accuracy on three long NER datasets and is capable of extracting entities from long texts in a GPU-memory-friendly manner.
Score: 10.886875977716608
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Named Entity Recognition (NER) is a fundamental problem in natural language processing (NLP). However, the task of extracting longer entity spans (e.g., awards) from extended texts (e.g., homepages) is barely explored. Current NER methods predominantly fall into two categories: span-based methods and generation-based methods. Span-based methods require the enumeration of all possible token-pair spans, followed by classification on each span, resulting in substantial redundant computations and excessive GPU memory usage. In contrast, generation-based methods involve prompting or fine-tuning large language models (LLMs) to adapt to downstream NER tasks. However, these methods struggle with the accurate generation of longer spans and often incur significant time costs for effective fine-tuning. To address these challenges, this paper introduces a lightweight span-based NER method called SeNER, which incorporates a bidirectional arrow attention mechanism coupled with LogN-Scaling on the [CLS] token to embed long texts effectively, and comprises a novel bidirectional sliding-window plus-shaped attention (BiSPA) mechanism to reduce redundant candidate token-pair spans significantly and model interactions between token-pair spans simultaneously. Extensive experiments demonstrate that our method achieves state-of-the-art extraction accuracy on three long NER datasets and is capable of extracting entities from long texts in a GPU-memory-friendly manner. Code: https://github.com/THUDM/scholar-profiling/tree/main/sener

Related papers

Embedded Named Entity Recognition using Probing Classifiers [10.573861741540853]
EMBER enables streaming named entity recognition in decoder-only language models without fine-tuning them. We show that EMBER maintains high token generation rates, with only a negligible decrease in speed of around 1%. We make our code and data available online, including a toolkit for training, testing, and deploying efficient token classification models.
arXiv Detail & Related papers (2024-03-18T12:58:16Z)
Reinforcement Learning with Token-level Feedback for Controllable Text Generation [16.117006822479407]
We propose a novel reinforcement learning algorithm named TOLE which formulates TOken-LEvel rewards for controllable text generation. Experimental results show that our algorithm can achieve superior performance on both single-attribute and multi-attribute control tasks.
arXiv Detail & Related papers (2024-03-18T08:18:37Z)
Lookahead: An Inference Acceleration Framework for Large Language Model with Lossless Generation Accuracy [46.81745860690336]
Large Language Models (LLMs) have made significant advancements across various tasks, such as question answering, translation, text summarization, and dialogue systems. This paper presents a generic framework for accelerating the inference process, resulting in a substantial increase in speed and cost reduction. We conduct extensive experiments to demonstrate the significant improvements achieved by applying our inference acceleration framework.
arXiv Detail & Related papers (2023-12-20T02:55:15Z)
Explaining Interactions Between Text Spans [50.70253702800355]
Reasoning over spans of tokens from different parts of the input is essential for natural language understanding. We introduce SpanEx, a dataset of human span interaction explanations for two NLU tasks: NLI and FC. We then investigate the decision-making processes of multiple fine-tuned large language models in terms of the employed connections between spans.
arXiv Detail & Related papers (2023-10-20T13:52:37Z)
Recurrent Attention Networks for Long-text Modeling [14.710722261441822]
This paper proposes a novel long-document encoding model, Recurrent Attention Network (RAN), to enable the recurrent operation of self-attention. RAN is capable of extracting global semantics in both token-level and document-level representations, making it inherently compatible with both sequential and classification tasks.
arXiv Detail & Related papers (2023-06-12T03:28:33Z)
Scalable Learning of Latent Language Structure With Logical Offline Cycle Consistency [71.42261918225773]
Conceptually, LOCCO can be viewed as a form of self-learning where the semantic being trained is used to generate annotations for unlabeled text. As an added bonus, the annotations produced by LOCCO can be trivially repurposed to train a neural text generation model.
arXiv Detail & Related papers (2023-05-31T16:47:20Z)
Semantics-Aware Dynamic Localization and Refinement for Referring Image Segmentation [102.25240608024063]
Referring image segments an image from a language expression. We develop an algorithm that shifts from being localization-centric to segmentation-language. Compared to its counterparts, our method is more versatile yet effective.
arXiv Detail & Related papers (2023-03-11T08:42:40Z)
SpanProto: A Two-stage Span-based Prototypical Network for Few-shot Named Entity Recognition [45.012327072558975]
Few-shot Named Entity Recognition (NER) aims to identify named entities with very little annotated data. We propose a seminal span-based prototypical network (SpanProto) that tackles few-shot NER via a two-stage approach. In the span extraction stage, we transform the sequential tags into a global boundary matrix, enabling the model to focus on the explicit boundary information. For mention classification, we leverage prototypical learning to capture the semantic representations for each labeled span and make the model better adapt to novel-class entities.
arXiv Detail & Related papers (2022-10-17T12:59:33Z)
Propose-and-Refine: A Two-Stage Set Prediction Network for Nested Named Entity Recognition [13.010064498077863]
We present the Propose-and-Refine Network (PnRNet), a two-stage set prediction network for nested NER. In the propose stage, we use a span-based predictor to generate some coarse entity predictions as entity proposals. In the refine stage, proposals interact with each other, and richer contextual information is incorporated into the proposal representations. Experiments show that PnRNet achieves state-of-the-art performance on four nested NER datasets and one flat NER dataset.
arXiv Detail & Related papers (2022-04-27T06:58:45Z)
HETFORMER: Heterogeneous Transformer with Sparse Attention for Long-Text Extractive Summarization [57.798070356553936]
HETFORMER is a Transformer-based pre-trained model with multi-granularity sparse attentions for extractive summarization. Experiments on both single- and multi-document summarization tasks show that HETFORMER achieves state-of-the-art performance in Rouge F1.
arXiv Detail & Related papers (2021-10-12T22:42:31Z)
Locate and Label: A Two-stage Identifier for Nested Named Entity Recognition [9.809157050048375]
We propose a two-stage entity identifier for named entity recognition. First, we generate span proposals by filtering and boundary regression on the seed spans to locate the entities, and then label the boundary-adjusted span proposals with the corresponding categories. Our method effectively utilizes the boundary information of entities and partially matched spans during training.
arXiv Detail & Related papers (2021-05-14T12:52:34Z)
ABCNet v2: Adaptive Bezier-Curve Network for Real-time End-to-end Text Spotting [108.93803186429017]
End-to-end text-spotting aims to integrate detection and recognition in a unified framework. Here, we tackle end-to-end text spotting by presenting Adaptive Bezier Curve Network v2 (ABCNet v2) Our main contributions are four-fold: 1) For the first time, we adaptively fit arbitrarily-shaped text by a parameterized Bezier curve, which, compared with segmentation-based methods, can not only provide structured output but also controllable representation. Comprehensive experiments conducted on various bilingual (English and Chinese) benchmark datasets demonstrate that ABCNet v2 can achieve state-of-the
arXiv Detail & Related papers (2021-05-08T07:46:55Z)
POINTER: Constrained Progressive Text Generation via Insertion-based Generative Pre-training [93.79766670391618]
We present POINTER, a novel insertion-based approach for hard-constrained text generation. The proposed method operates by progressively inserting new tokens between existing tokens in a parallel manner. The resulting coarse-to-fine hierarchy makes the generation process intuitive and interpretable.
arXiv Detail & Related papers (2020-05-01T18:11:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.