Homa at SemEval-2025 Task 5: Aligning Librarian Records with OntoAligner for Subject Tagging
- URL: http://arxiv.org/abs/2504.21474v1
- Date: Wed, 30 Apr 2025 09:52:51 GMT
- Title: Homa at SemEval-2025 Task 5: Aligning Librarian Records with OntoAligner for Subject Tagging
- Authors: Hadi Bayrami Asl Tekanlou, Jafar Razmara, Mahsa Sanaei, Mostafa Rahgouy, Hamed Babaei Giglou,
- Abstract summary: This paper presents our system, Homa, for SemEval-2025 Task 5: Subject Tagging.<n>It focuses on automatically assigning subject labels to technical records from TIBKAT using the Gemeinsame Normdatei (GND) taxonomy.<n>Our approach formulates the subject tagging problem as an alignment task, where records are matched to categories based on semantic similarity.
- Score: 1.2582887633807602
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: This paper presents our system, Homa, for SemEval-2025 Task 5: Subject Tagging, which focuses on automatically assigning subject labels to technical records from TIBKAT using the Gemeinsame Normdatei (GND) taxonomy. We leverage OntoAligner, a modular ontology alignment toolkit, to address this task by integrating retrieval-augmented generation (RAG) techniques. Our approach formulates the subject tagging problem as an alignment task, where records are matched to GND categories based on semantic similarity. We evaluate OntoAligner's adaptability for subject indexing and analyze its effectiveness in handling multilingual records. Experimental results demonstrate the strengths and limitations of this method, highlighting the potential of alignment techniques for improving subject tagging in digital libraries.
Related papers
- DNB-AI-Project at SemEval-2025 Task 5: An LLM-Ensemble Approach for Automated Subject Indexing [0.0]
Our system relies on prompting a selection of LLMs with varying examples of intellectually annotated records.<n>We map the generated keywords to the target vocabulary, aggregate the resulting subject terms to an ensemble vote and rank them as to their relevance to the record.<n>Our system is fourth in the quantitative ranking in the all-subjects track, but the best result in the qualitative ranking conducted by subject indexing experts.
arXiv Detail & Related papers (2025-04-30T12:47:09Z) - TartuNLP at SemEval-2025 Task 5: Subject Tagging as Two-Stage Information Retrieval [0.21485350418225246]
We present our submission to the Task 5 of SemEval-2025.<n>This task aims to aid librarians in assigning subject tags to the library records by producing a list of likely relevant tags for a given document.<n>We leverage two types of encoder models to build a two-stage information retrieval system.
arXiv Detail & Related papers (2025-04-30T11:44:08Z) - Annif at SemEval-2025 Task 5: Traditional XMTC augmented by LLMs [0.0]
This paper presents the Annif system in the SemEval-2025 Task 5 (LLMs)<n>It focussed on subject indexing using large language models.<n>Our approach combines traditional natural language processing and machine learning techniques.
arXiv Detail & Related papers (2025-04-28T11:04:23Z) - The Power of Summary-Source Alignments [62.76959473193149]
Multi-document summarization (MDS) is a challenging task, often decomposed to subtasks of salience and redundancy detection.
alignment of corresponding sentences between a reference summary and its source documents has been leveraged to generate training data.
This paper proposes extending the summary-source alignment framework by applying it at the more fine-grained proposition span level.
arXiv Detail & Related papers (2024-06-02T19:35:19Z) - HiGen: Hierarchy-Aware Sequence Generation for Hierarchical Text
Classification [19.12354692458442]
Hierarchical text classification (HTC) is a complex subtask under multi-label text classification.
We propose HiGen, a text-generation-based framework utilizing language models to encode dynamic text representations.
arXiv Detail & Related papers (2024-01-24T04:44:42Z) - Association Graph Learning for Multi-Task Classification with Category
Shifts [68.58829338426712]
We focus on multi-task classification, where related classification tasks share the same label space and are learned simultaneously.
We learn an association graph to transfer knowledge among tasks for missing classes.
Our method consistently performs better than representative baselines.
arXiv Detail & Related papers (2022-10-10T12:37:41Z) - TagRec++: Hierarchical Label Aware Attention Network for Question
Categorization [0.3683202928838613]
Online learning systems organize the content according to a well defined taxonomy of hierarchical nature.
The task of categorizing inputs to the hierarchical labels is usually cast as a flat multi-class classification problem.
We formulate the task as a dense retrieval problem to retrieve the appropriate hierarchical labels for each content.
arXiv Detail & Related papers (2022-08-10T05:08:37Z) - Your Classifier can Secretly Suffice Multi-Source Domain Adaptation [72.47706604261992]
Multi-Source Domain Adaptation (MSDA) deals with the transfer of task knowledge from multiple labeled source domains to an unlabeled target domain.
We present a different perspective to MSDA wherein deep models are observed to implicitly align the domains under label supervision.
arXiv Detail & Related papers (2021-03-20T12:44:13Z) - Multitask Learning for Class-Imbalanced Discourse Classification [74.41900374452472]
We show that a multitask approach can improve 7% Micro F1-score upon current state-of-the-art benchmarks.
We also offer a comparative review of additional techniques proposed to address resource-poor problems in NLP.
arXiv Detail & Related papers (2021-01-02T07:13:41Z) - Adaptive Self-training for Few-shot Neural Sequence Labeling [55.43109437200101]
We develop techniques to address the label scarcity challenge for neural sequence labeling models.
Self-training serves as an effective mechanism to learn from large amounts of unlabeled data.
meta-learning helps in adaptive sample re-weighting to mitigate error propagation from noisy pseudo-labels.
arXiv Detail & Related papers (2020-10-07T22:29:05Z) - Interaction Matching for Long-Tail Multi-Label Classification [57.262792333593644]
We present an elegant and effective approach for addressing limitations in existing multi-label classification models.
By performing soft n-gram interaction matching, we match labels with natural language descriptions.
arXiv Detail & Related papers (2020-05-18T15:27:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.