Related papers: SemEval-2025 Task 5: LLMs4Subjects -- LLM-based Automated Subject Tagging for a National Technical Library's Open-Access Catalog

SemEval-2025 Task 5: LLMs4Subjects -- LLM-based Automated Subject Tagging for a National Technical Library's Open-Access Catalog

URL: http://arxiv.org/abs/2504.07199v3
Date: Fri, 23 May 2025 10:14:35 GMT
Title: SemEval-2025 Task 5: LLMs4Subjects -- LLM-based Automated Subject Tagging for a National Technical Library's Open-Access Catalog
Authors: Jennifer D'Souza, Sameer Sadruddin, Holger Israel, Mathias Begoin, Diana Slawig,
Abstract summary: We present SemEval-2025 Task 5: LLMs4Subjects, a shared task on automated subject tagging for scientific and technical records in English and German using the taxonomy.<n>Participants developed systems to recommend top-k subjects, evaluated through quantitative metrics (precision, recall, F1-score) and qualitative assessments by subject specialists.
Score: 0.0
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: We present SemEval-2025 Task 5: LLMs4Subjects, a shared task on automated subject tagging for scientific and technical records in English and German using the GND taxonomy. Participants developed LLM-based systems to recommend top-k subjects, evaluated through quantitative metrics (precision, recall, F1-score) and qualitative assessments by subject specialists. Results highlight the effectiveness of LLM ensembles, synthetic data generation, and multilingual processing, offering insights into applying LLMs for digital library classification.

Related papers

Annif at the GermEval-2025 LLMs4Subjects Task: Traditional XMTC Augmented by Efficient LLMs [0.03743146689655111]
This paper presents the Annif system in the LLMs4Subjects (Subtask 2) at GermEval-2025.<n>The task required creating subject predictions for records using large language models with a special focus on computational efficiency.
arXiv Detail & Related papers (2025-08-21T14:04:20Z)
Large Language Models for Combinatorial Optimization: A Systematic Review [3.271128864157512]
This systematic review explores the application of Large Language Models in Combinatorial Optimization.<n>We conduct a literature search via Scopus and Google Scholar, examining over 2,000 publications.<n>We classify these studies into semantic categories and topics to provide a comprehensive overview of the field.
arXiv Detail & Related papers (2025-07-04T15:08:10Z)
An Empirical Study of Many-to-Many Summarization with Large Language Models [82.10000188179168]
Large language models (LLMs) have shown strong multi-lingual abilities, giving them the potential to perform Many-to-many summarization (M2MS) in real applications.<n>This work presents a systematic empirical study on LLMs' M2MS ability.
arXiv Detail & Related papers (2025-05-19T11:18:54Z)
DNB-AI-Project at SemEval-2025 Task 5: An LLM-Ensemble Approach for Automated Subject Indexing [0.0]
Our system relies on prompting a selection of LLMs with varying examples of intellectually annotated records. We map the generated keywords to the target vocabulary, aggregate the resulting subject terms to an ensemble vote and rank them as to their relevance to the record. Our system is fourth in the quantitative ranking in the all-subjects track, but the best result in the qualitative ranking conducted by subject indexing experts.
arXiv Detail & Related papers (2025-04-30T12:47:09Z)
Annif at SemEval-2025 Task 5: Traditional XMTC augmented by LLMs [0.0]
This paper presents the Annif system in the SemEval-2025 Task 5 (LLMs) It focussed on subject indexing using large language models. Our approach combines traditional natural language processing and machine learning techniques.
arXiv Detail & Related papers (2025-04-28T11:04:23Z)
Summarization Metrics for Spanish and Basque: Do Automatic Scores and LLM-Judges Correlate with Humans? [6.974741712647656]
We collect human judgments on 2,040 abstractive summaries in Basque and Spanish.<n>For each summary, annotators evaluated five criteria on a 5-point Likert scale: coherence, consistency, fluency, relevance, and 5W1H.<n>We release BASSE and our code publicly, along with the first large-scale Basque summarization dataset containing 22,525 news articles with their subheads.
arXiv Detail & Related papers (2025-03-21T10:52:20Z)
Latent Factor Models Meets Instructions:Goal-conditioned Latent Factor Discovery without Task Supervision [50.45597801390757]
Instruct-LF is a goal-oriented latent factor discovery system.<n>It integrates instruction-following ability with statistical models to handle noisy datasets.
arXiv Detail & Related papers (2025-02-21T02:03:08Z)
GuideLLM: Exploring LLM-Guided Conversation with Applications in Autobiography Interviewing [73.8469700907927]
Large Language Models (LLMs) succeed in human-guided conversations such as instruction following and question answering. In this study, we first characterize LLM-guided conversation into three fundamental components: Goal Navigation; (ii) Context Management; (iii) Empathetic Engagement. We compare GuideLLM with 6 state-of-the-art LLMs such as GPT-4o and Llama-3-70b-Instruct, from the perspective of interviewing quality, and autobiography generation quality.
arXiv Detail & Related papers (2025-02-10T14:11:32Z)
DOCBENCH: A Benchmark for Evaluating LLM-based Document Reading Systems [99.17123445211115]
We introduce DocBench, a benchmark to evaluate large language model (LLM)-based document reading systems. Our benchmark involves the recruitment of human annotators and the generation of synthetic questions. It includes 229 real documents and 1,102 questions, spanning across five different domains and four major types of questions.
arXiv Detail & Related papers (2024-07-15T13:17:42Z)
Unveiling LLM Evaluation Focused on Metrics: Challenges and Solutions [2.5179515260542544]
Large Language Models (LLMs) have gained significant attention across academia and industry for their versatile applications in text generation, question answering, and text summarization. To quantify the performance, it's crucial to have a comprehensive grasp of existing metrics. This paper offers a comprehensive exploration of LLM evaluation from a metrics perspective, providing insights into the selection and interpretation of metrics currently in use.
arXiv Detail & Related papers (2024-04-14T03:54:00Z)
Large Language Models for Data Annotation and Synthesis: A Survey [49.8318827245266]
This survey focuses on the utility of Large Language Models for data annotation and synthesis. It includes an in-depth taxonomy of data types that LLMs can annotate, a review of learning strategies for models utilizing LLM-generated annotations, and a detailed discussion of the primary challenges and limitations associated with using LLMs for data annotation and synthesis.
arXiv Detail & Related papers (2024-02-21T00:44:04Z)
Large Language Models: A Survey [66.39828929831017]
Large Language Models (LLMs) have drawn a lot of attention due to their strong performance on a wide range of natural language tasks.<n>LLMs' ability of general-purpose language understanding and generation is acquired by training billions of model's parameters on massive amounts of text data.
arXiv Detail & Related papers (2024-02-09T05:37:09Z)
A Survey on Large Language Models for Software Engineering [15.468484685849983]
Large Language Models (LLMs) are used to automate a broad range of Software Engineering (SE) tasks. This paper summarizes the current state-of-the-art research in the LLM-based SE community.
arXiv Detail & Related papers (2023-12-23T11:09:40Z)
Benchmarking Generation and Evaluation Capabilities of Large Language Models for Instruction Controllable Summarization [132.25202059478065]
We benchmark large language models (LLMs) on instruction controllable text summarization. Our study reveals that instruction controllable text summarization remains a challenging task for LLMs.
arXiv Detail & Related papers (2023-11-15T18:25:26Z)
Large Language Models for Software Engineering: A Systematic Literature Review [34.12458948051519]
Large Language Models (LLMs) have significantly impacted numerous domains, including Software Engineering (SE) We select and analyze 395 research papers from January 2017 to January 2024 to answer four key research questions (RQs) From the answers to these RQs, we discuss the current state-of-the-art and trends, identifying gaps in existing research, and flagging promising areas for future study.
arXiv Detail & Related papers (2023-08-21T10:37:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.