Related papers: Evaluating Named Entity Recognition Models for Russian Cultural News Texts: From BERT to LLM

Evaluating Named Entity Recognition Models for Russian Cultural News Texts: From BERT to LLM

URL: http://arxiv.org/abs/2506.02589v1
Date: Tue, 03 Jun 2025 08:11:16 GMT
Title: Evaluating Named Entity Recognition Models for Russian Cultural News Texts: From BERT to LLM
Authors: Maria Levchenko,
Abstract summary: The study utilizes the unique SPbLitGuide dataset, a collection of event announcements from Saint Petersburg spanning 1999 to 2019.<n>A comparative evaluation of diverse NER models is presented, encompassing established transformer-based architectures.<n>The research contributes to a deeper understanding of current NER model capabilities and limitations when applied to morphologically rich languages like Russian.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper addresses the challenge of Named Entity Recognition (NER) for person names within the specialized domain of Russian news texts concerning cultural events. The study utilizes the unique SPbLitGuide dataset, a collection of event announcements from Saint Petersburg spanning 1999 to 2019. A comparative evaluation of diverse NER models is presented, encompassing established transformer-based architectures such as DeepPavlov, RoBERTa, and SpaCy, alongside recent Large Language Models (LLMs) including GPT-3.5, GPT-4, and GPT-4o. Key findings highlight the superior performance of GPT-4o when provided with specific prompting for JSON output, achieving an F1 score of 0.93. Furthermore, GPT-4 demonstrated the highest precision at 0.99. The research contributes to a deeper understanding of current NER model capabilities and limitations when applied to morphologically rich languages like Russian within the cultural heritage domain, offering insights for researchers and practitioners. Follow-up evaluation with GPT-4.1 (April 2025) achieves F1=0.94 for both simple and structured prompts, demonstrating rapid progress across model families and simplified deployment requirements.

Related papers

Exploring Fine-tuned Generative Models for Keyphrase Selection: A Case Study for Russian [1.565361244756411]
We explored how to apply fine-tuned generative transformer-based models to the specific task of keyphrase selection within Russian scientific texts. Experiments were conducted on the texts of Russian scientific abstracts from four domains: mathematics & computer science, history, medicine, and linguistics. The use of generative models, namely mBART, led to gains in in-domain performance (up to 4.9% in BERTScore, 9.0% in ROUGE-1, and 12.2% in F1-score) over three keyphrase extraction baselines for the Russian language.
arXiv Detail & Related papers (2024-09-16T18:15:28Z)
Evaluating Named Entity Recognition: A comparative analysis of mono- and multilingual transformer models on a novel Brazilian corporate earnings call transcripts dataset [3.809702129519642]
We identify two models pre-trained in Brazilian Portuguese (BERTimbau and PTT5) and two multilingual models (mBERT and mT5) Our study aimed to evaluate their performance on a financial Named Entity Recognition (NER) task and determine the computational requirements for fine-tuning and inference.
arXiv Detail & Related papers (2024-03-18T19:53:56Z)
Distilling Named Entity Recognition Models for Endangered Species from Large Language Models [27.266030092418642]
We create datasets for named entity recognition and relation extraction via a two-stage process. The constructed dataset was then used to fine-tune both general BERT and domain-specific BERT variants. Experiments show that our knowledge transfer approach is effective at creating a NER model suitable for detecting endangered species from texts.
arXiv Detail & Related papers (2024-03-13T15:38:55Z)
GPT4Vis: What Can GPT-4 Do for Zero-shot Visual Recognition? [82.40761196684524]
This paper centers on the evaluation of GPT-4's linguistic and visual capabilities in zero-shot visual recognition tasks. We conduct extensive experiments to evaluate GPT-4's performance across images, videos, and point clouds. Our findings show that GPT-4, enhanced with rich linguistic descriptions, significantly improves zero-shot recognition.
arXiv Detail & Related papers (2023-11-27T11:29:10Z)
ExtractGPT: Exploring the Potential of Large Language Models for Product Attribute Value Extraction [52.14681890859275]
E-commerce platforms require structured product data in the form of attribute-value pairs. BERT-based extraction methods require large amounts of task-specific training data. This paper explores using large language models (LLMs) as a more training-data efficient and robust alternative.
arXiv Detail & Related papers (2023-10-19T07:39:00Z)
Large Language Models for Semantic Monitoring of Corporate Disclosures: A Case Study on Korea's Top 50 KOSPI Companies [0.08192907805418582]
State-of-the-art language models such as OpenAI's GPT-3.5-turbo and GPT-4 offer unprecedented opportunities for automating complex tasks. This research paper delves into the capabilities of these models for semantically analyzing corporate disclosures in the Korean context.
arXiv Detail & Related papers (2023-09-01T01:51:28Z)
Large-Scale Text Analysis Using Generative Language Models: A Case Study in Discovering Public Value Expressions in AI Patents [2.246222223318928]
This paper employs a novel approach using a generative language model (GPT-4) to produce labels and rationales for large-scale text analysis. We collect a database comprising 154,934 patent documents using an advanced Boolean query submitted to InnovationQ+. We design a framework for identifying and labeling public value expressions in these AI patent sentences.
arXiv Detail & Related papers (2023-05-17T17:18:26Z)
Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agents [53.78782375511531]
Large Language Models (LLMs) have demonstrated remarkable zero-shot generalization across various language-related tasks.<n>This paper investigates generative LLMs for relevance ranking in Information Retrieval (IR)<n>To address concerns about data contamination of LLMs, we collect a new test set called NovelEval.<n>To improve efficiency in real-world applications, we delve into the potential for distilling the ranking capabilities of ChatGPT into small specialized models.
arXiv Detail & Related papers (2023-04-19T10:16:03Z)
Exploring the Trade-Offs: Unified Large Language Models vs Local Fine-Tuned Models for Highly-Specific Radiology NLI Task [49.50140712943701]
We evaluate the performance of ChatGPT/GPT-4 on a radiology NLI task and compare it to other models fine-tuned specifically on task-related data samples. We also conduct a comprehensive investigation on ChatGPT/GPT-4's reasoning ability by introducing varying levels of inference difficulty.
arXiv Detail & Related papers (2023-04-18T17:21:48Z)
GPT-4 Technical Report [116.90398195245983]
GPT-4 is a large-scale, multimodal model which can accept image and text inputs and produce text outputs. It exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers.
arXiv Detail & Related papers (2023-03-15T17:15:04Z)
How Robust is GPT-3.5 to Predecessors? A Comprehensive Study on Language Understanding Tasks [65.7949334650854]
GPT-3.5 models have demonstrated impressive performance in various Natural Language Processing (NLP) tasks. However, their robustness and abilities to handle various complexities of the open world have yet to be explored. We show that GPT-3.5 faces some specific robustness challenges, including instability, prompt sensitivity, and number sensitivity.
arXiv Detail & Related papers (2023-03-01T07:39:01Z)
News Summarization and Evaluation in the Era of GPT-3 [73.48220043216087]
We study how GPT-3 compares against fine-tuned models trained on large summarization datasets. We show that not only do humans overwhelmingly prefer GPT-3 summaries, prompted using only a task description, but these also do not suffer from common dataset-specific issues such as poor factuality.
arXiv Detail & Related papers (2022-09-26T01:04:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.