DNB-AI-Project at SemEval-2025 Task 5: An LLM-Ensemble Approach for Automated Subject Indexing
- URL: http://arxiv.org/abs/2504.21589v1
- Date: Wed, 30 Apr 2025 12:47:09 GMT
- Title: DNB-AI-Project at SemEval-2025 Task 5: An LLM-Ensemble Approach for Automated Subject Indexing
- Authors: Lisa Kluge, Maximilian Kähler,
- Abstract summary: Our system relies on prompting a selection of LLMs with varying examples of intellectually annotated records.<n>We map the generated keywords to the target vocabulary, aggregate the resulting subject terms to an ensemble vote and rank them as to their relevance to the record.<n>Our system is fourth in the quantitative ranking in the all-subjects track, but the best result in the qualitative ranking conducted by subject indexing experts.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents our system developed for the SemEval-2025 Task 5: LLMs4Subjects: LLM-based Automated Subject Tagging for a National Technical Library's Open-Access Catalog. Our system relies on prompting a selection of LLMs with varying examples of intellectually annotated records and asking the LLMs to similarly suggest keywords for new records. This few-shot prompting technique is combined with a series of post-processing steps that map the generated keywords to the target vocabulary, aggregate the resulting subject terms to an ensemble vote and, finally, rank them as to their relevance to the record. Our system is fourth in the quantitative ranking in the all-subjects track, but achieves the best result in the qualitative ranking conducted by subject indexing experts.
Related papers
- AgenticTagger: Structured Item Representation for Recommendation with LLM Agents [58.12004213978182]
AgenticTagger is a framework that queries LLMs for representing items with sequences of text descriptors.<n>To effectively and efficiently ground vocabulary in the item corpus of interest, we design a multi-agent reflection mechanism.<n>Experiments on public and private data show AgenticTagger brings consistent improvements across diverse recommendation scenarios.
arXiv Detail & Related papers (2026-02-05T18:01:37Z) - Annif at the GermEval-2025 LLMs4Subjects Task: Traditional XMTC Augmented by Efficient LLMs [0.03743146689655111]
This paper presents the Annif system in the LLMs4Subjects (Subtask 2) at GermEval-2025.<n>The task required creating subject predictions for records using large language models with a special focus on computational efficiency.
arXiv Detail & Related papers (2025-08-21T14:04:20Z) - Let's Use ChatGPT To Write Our Paper! Benchmarking LLMs To Write the Introduction of a Research Paper [64.50822834679101]
SciIG is a task that evaluates LLMs' ability to produce coherent introductions from titles, abstracts, and related works.<n>We assess five state-of-the-art models, including open-source (DeepSeek-v3, Gemma-3-12B, LLaMA 4-Maverick, MistralAI Small 3.1) and closed-source GPT-4o systems.<n>Results demonstrate LLaMA-4 Maverick's superior performance on most metrics, particularly in semantic similarity and faithfulness.
arXiv Detail & Related papers (2025-08-19T21:11:11Z) - LGAR: Zero-Shot LLM-Guided Neural Ranking for Abstract Screening in Systematic Literature Reviews [0.9314555897827079]
Systematic literature reviews aim to identify and evaluate all relevant papers on a topic.<n>To date, abstract screening methods using large language models (LLMs) focus on binary classification settings.<n>We propose LGAR, a zero-shot LLM Guided Abstract Ranker composed of an LLM based graded relevance scorer and a dense re-ranker.
arXiv Detail & Related papers (2025-05-30T16:18:50Z) - Homa at SemEval-2025 Task 5: Aligning Librarian Records with OntoAligner for Subject Tagging [1.2582887633807602]
This paper presents our system, Homa, for SemEval-2025 Task 5: Subject Tagging.<n>It focuses on automatically assigning subject labels to technical records from TIBKAT using the Gemeinsame Normdatei (GND) taxonomy.<n>Our approach formulates the subject tagging problem as an alignment task, where records are matched to categories based on semantic similarity.
arXiv Detail & Related papers (2025-04-30T09:52:51Z) - Annif at SemEval-2025 Task 5: Traditional XMTC augmented by LLMs [0.0]
This paper presents the Annif system in the SemEval-2025 Task 5 (LLMs)<n>It focussed on subject indexing using large language models.<n>Our approach combines traditional natural language processing and machine learning techniques.
arXiv Detail & Related papers (2025-04-28T11:04:23Z) - SemEval-2025 Task 5: LLMs4Subjects -- LLM-based Automated Subject Tagging for a National Technical Library's Open-Access Catalog [0.0]
We present SemEval-2025 Task 5: LLMs4Subjects, a shared task on automated subject tagging for scientific and technical records in English and German using the taxonomy.<n>Participants developed systems to recommend top-k subjects, evaluated through quantitative metrics (precision, recall, F1-score) and qualitative assessments by subject specialists.
arXiv Detail & Related papers (2025-04-09T18:26:46Z) - Harnessing Multiple Large Language Models: A Survey on LLM Ensemble [67.40879272722437]
This paper presents the first systematic review of recent developments in LLM Ensemble.<n>We introduce our taxonomy of LLM Ensemble and discuss several related research problems.<n>We also provide a more in-depth classification of the methods under the broad categories of "ensemble-before-inference, ensemble-during-inference, ensemble-after-inference''
arXiv Detail & Related papers (2025-02-25T09:48:53Z) - Prompt and circumstance: A word-by-word LLM prompting approach to interlinear glossing for low-resource languages [6.4977738682502295]
We investigate the effectiveness of a retrieval-based LLM prompting approach to glossing, applied to the seven languages from the SIGMORPHON 2023 shared task.<n>Our system beats the BERT-based shared task baseline for every language in the morpheme-level score category.<n>In a case study on Tsez, we ask the LLM to automatically create and follow linguistic instructions, reducing errors on a confusing grammatical feature.
arXiv Detail & Related papers (2025-02-13T21:23:16Z) - Self-Calibrated Listwise Reranking with Large Language Models [137.6557607279876]
Large language models (LLMs) have been employed in reranking tasks through a sequence-to-sequence approach.
This reranking paradigm requires a sliding window strategy to iteratively handle larger candidate sets.
We propose a novel self-calibrated listwise reranking method, which aims to leverage LLMs to produce global relevance scores for ranking.
arXiv Detail & Related papers (2024-11-07T10:31:31Z) - DOCBENCH: A Benchmark for Evaluating LLM-based Document Reading Systems [99.17123445211115]
We introduce DocBench, a benchmark to evaluate large language model (LLM)-based document reading systems.
Our benchmark involves the recruitment of human annotators and the generation of synthetic questions.
It includes 229 real documents and 1,102 questions, spanning across five different domains and four major types of questions.
arXiv Detail & Related papers (2024-07-15T13:17:42Z) - CHIQ: Contextual History Enhancement for Improving Query Rewriting in Conversational Search [67.6104548484555]
We introduce CHIQ, a two-step method that leverages the capabilities of open-source large language models (LLMs) to resolve ambiguities in the conversation history before query rewriting.
We demonstrate on five well-established benchmarks that CHIQ leads to state-of-the-art results across most settings.
arXiv Detail & Related papers (2024-06-07T15:23:53Z) - PromptReps: Prompting Large Language Models to Generate Dense and Sparse Representations for Zero-Shot Document Retrieval [76.50690734636477]
We propose PromptReps, which combines the advantages of both categories: no need for training and the ability to retrieve from the whole corpus.
The retrieval system harnesses both dense text embedding and sparse bag-of-words representations.
arXiv Detail & Related papers (2024-04-29T04:51:30Z) - SuRe: Summarizing Retrievals using Answer Candidates for Open-domain QA of LLMs [85.54906813106683]
We propose a simple yet effective framework to enhance open-domain question answering (ODQA) with large language models (LLMs)
SuRe helps LLMs predict more accurate answers for a given question, which are well-supported by the summarized retrieval (SuRe)
Experimental results on diverse ODQA benchmarks demonstrate the superiority of SuRe, with improvements of up to 4.6% in exact match (EM) and 4.0% in F1 score over standard prompting approaches.
arXiv Detail & Related papers (2024-04-17T01:15:54Z) - Benchmarking LLMs on the Semantic Overlap Summarization Task [9.656095701778975]
This paper comprehensively evaluates Large Language Models (LLMs) on the Semantic Overlap Summarization (SOS) task.
We report well-established metrics like ROUGE, BERTscore, and SEM-F1$ on two different datasets of alternative narratives.
arXiv Detail & Related papers (2024-02-26T20:33:50Z) - Large Language Models for Data Annotation and Synthesis: A Survey [49.8318827245266]
This survey focuses on the utility of Large Language Models for data annotation and synthesis.<n>It includes an in-depth taxonomy of data types that LLMs can annotate, a review of learning strategies for models utilizing LLM-generated annotations, and a detailed discussion of the primary challenges and limitations associated with using LLMs for data annotation and synthesis.
arXiv Detail & Related papers (2024-02-21T00:44:04Z) - Benchmarking Generation and Evaluation Capabilities of Large Language Models for Instruction Controllable Summarization [132.25202059478065]
We benchmark large language models (LLMs) on instruction controllable text summarization.
Our study reveals that instruction controllable text summarization remains a challenging task for LLMs.
arXiv Detail & Related papers (2023-11-15T18:25:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.